Sensemaking for a plural world

Perspective Map

Generative AI and Intellectual Property: What Each Position Is Protecting

March 2026

Sarah is an illustrator who has been making a living doing commercial work for a decade. Last year she discovered that her distinctive style — laboriously developed across hundreds of hours of paid and unpaid work — had been replicated almost exactly by an image-generation model that had been trained, without her knowledge or consent, on images scraped from her public portfolio. When she contacted the company, she received a form letter explaining that training on publicly available internet data falls within fair use doctrine. She does not accept this. She knows what it cost her to develop that style, and she knows that the model now sells it at a fraction of what she charges. Marcus is a researcher at an AI lab who believes Sarah's frustration is understandable but that her legal theory is wrong. He argues that training a model on images is not meaningfully different from a human artist studying other artists' work, that copyright protects specific expressions rather than styles, and that restricting AI training data would be like restricting human artists from going to art school. He thinks the hard version of her position — that training without consent violates copyright — would make scientific research, search engines, and digital libraries impossible if applied consistently.

Elena is a legal scholar who thinks both Sarah and Marcus are too focused on the immediate battle and missing the deeper structural question. Copyright was designed to give creators a limited monopoly as an incentive to produce work, not to give them perpetual control over every downstream use of their ideas. But she also worries that the current legal framework, designed for individual human copying, is genuinely inadequate for a technology that can ingest billions of works simultaneously and produce outputs that compete directly with the people whose work trained it. She thinks the question is not who wins this lawsuit but what kind of IP system we want for the next century. And James, a labor organizer who works with writers and journalists, thinks all three of them are arguing about property when the real question is labor. Generative AI was built on work done by millions of people — not just famous illustrators whose style is identifiable but countless anonymous writers, coders, and artists who contributed to the dataset. Those people are not named in any lawsuit. Their work generated the value. They will receive nothing.

These four people are not disagreeing primarily about copyright law. They are disagreeing about what creative work is, what creators owe society, what society owes creators, and who is entitled to capture the value that collective human expression generates when it is transformed into a product. The generative AI and IP debate looks like a technical legal dispute about training data. Underneath it are harder questions: Is learning from prior work categorically different when done at machine scale? Does copyright protect expression or creative labor? And when a technology is built on everyone's contributions, who gets to own it?

What copyright holders and creators are protecting

The creator-protection position begins from a claim about consent. Sarah did not agree to have her work used to train a commercial product. The fact that she made her work publicly accessible — on a portfolio site, on Instagram, on ArtStation — was not an invitation for it to be scraped into a training dataset and used to produce competing outputs at industrial scale. The argument is not primarily philosophical; it is about what "public" means. Publishing work publicly has never before meant licensing it for unlimited commercial extraction. A photograph in a public gallery does not authorize a printer to reproduce it at will. The internet's openness was premised on certain baseline assumptions about what sharing work there did and did not permit.

They are protecting the economic relationship between creative work and creative livelihood. The core of this position is not that AI is wrong in some abstract sense but that it breaks the chain between making something distinctive and being compensated for that distinctiveness. Sarah's style is marketable because it is hers — because a client who wants it has to come to her. When a model can replicate it on demand, that economic relationship disappears. Copyright maximalists are defending not just a legal framework but the material conditions under which professional creative work is sustainable. Behind the legal filings of the Authors Guild, the litigation from Getty Images and music publishers, and the opt-out coalitions of illustrators and photographers is a simpler claim: we deserve to be asked.

They are protecting the principle that scale does not change the moral character of an act. The fair use argument made by AI companies rests partly on the claim that training a model on data is not "copying" in the relevant legal sense because the model does not store and reproduce the training data — it learns from it and generates new outputs. Creators dispute both the technical description and the moral logic. On the technical claim: models do store information about their training data in ways that can be extracted through adversarial prompting, and outputs can closely resemble specific training examples. On the moral logic: even if an AI's processing of a work is technically different from photocopying it, the consequence — a commercial product that competes with the original creator — is the same. The fact that this is done simultaneously to millions of works, rather than one at a time, does not improve the situation; it makes it worse.

They are protecting the right to control the uses of work in which one's identity is expressed. For many creators, the objection to AI training is not only economic but expressive. A novelist whose prose style, personal history, and decades of craft have been absorbed into a model that can now write "in the style of" her without her permission experiences this as something more than a licensing problem. Intellectual property law has always struggled with this — the tension between copyright's economic logic and the more personal moral rights traditions that European copyright law (but not American law) explicitly recognizes. The moral rights position — that creators have non-economic interests in their work's integrity and attribution — is not well served by the current US legal framework, but it names something real about why many creators experience AI training as a form of violation that goes beyond theft.

What AI developers and fair-use advocates are protecting

The developer position does not dispute that generative AI is trained on human-created work, or that this creates genuine disruption for some creators. It disputes the claim that this constitutes copyright infringement and argues that the policy implications of treating it as such would be seriously harmful. The legal foundation is fair use doctrine, particularly the transformative use test established in Campbell v. Acuff-Rose Music (1994): whether a use transforms the original material by adding new expression, meaning, or message and whether it serves a different function. AI developers argue that training a model on works to produce a general-purpose creative tool is transformative in exactly this sense — the model is not a repository of the training data but a different kind of thing that learned from it.

They are protecting the principle that learning from prior work is what creative and intellectual progress requires. The analogy to human learning is imperfect but not trivial. A novelist reads thousands of novels before writing one; a programmer studies thousands of programs before writing their own; a painter spends years copying masters in museums. None of this requires licensing. The fair use advocates argue that treating machine learning as categorically different from human learning because of scale imports a legal distinction that has no precedent and no clear principled basis. If reading and learning from publicly available work is permitted, why should it matter whether the reader is a human graduate student or a machine? The counter-argument — that it matters because of the commercial scale — is itself contested: commercial use is a fair use factor but not a determinative one.

They are protecting the possibility of building general-purpose knowledge infrastructure. The position is not only about AI companies' commercial interests. It encompasses search engines (which crawl and index content without individual consent), digital archives (which copy works to preserve them), computational research (which uses large datasets to study language, culture, and history), and scientific work (which trains models on research data). A legal doctrine that makes training on publicly available data presumptively infringing would reach all of these activities. Researchers who use large text corpora to study linguistic change, historians who use digitized archives, and public interest organizations that build tools for navigating government records would all be affected by a ruling that training data requires consent and compensation.

They are protecting innovation that is broadly accessible against concentration in whoever holds legacy content rights. One underappreciated dimension of the copyright maximalist position is who benefits if training data requires licensing. The answer is largely: existing large rights-holders. Major publishers, record labels, and media conglomerates that hold large back catalogs would become the gatekeepers of what AI can be trained on. Independent artists, whose work is dispersed and unlicensed, might gain formal rights but practically lack the leverage to enforce them. The developer position sometimes argues that the licensing regime the copyright maximalists want would entrench media consolidation rather than protect individual creators — that the practical beneficiaries of training data licenses would be Disney, News Corp, and the major labels, not the illustrators and novelists making the public case for consent.

What open knowledge advocates are protecting

The open knowledge position approaches this debate from a different direction. Its primary concern is not who wins the current litigation but whether the legal and policy frameworks that emerge will entrench or loosen the barriers to accessing and building on human knowledge. Advocates in this tradition — drawing on the work of Lawrence Lessig, the Creative Commons movement, and open access scholarship — have spent decades arguing that copyright has expanded far beyond its original purpose and that excessive IP protection harms the public interest. They see the generative AI debate as a new front in an old struggle.

They are protecting the public domain as a genuine commons rather than a shrinking residue of expired copyrights. Copyright terms in the United States have been extended repeatedly — from fourteen years at the founding to the current life-plus-seventy-years standard, driven largely by lobbying from entertainment companies seeking to prevent Mickey Mouse and other valuable properties from entering the public domain. Open knowledge advocates argue that this expansion has been systematically bad for culture, innovation, and education — that works should enter the public domain much sooner than they do, and that the legal framework governing AI training is another battleground in this larger war. The danger they see is not only that AI companies get an unfair advantage but that the response to AI — mandatory licensing, consent requirements, new enforcement mechanisms — becomes an occasion to extend and entrench IP maximalism.

They are protecting the distinction between expression and style as a principled limit on IP protection. Copyright has always protected specific expressions, not ideas, styles, or techniques. You cannot copyright a genre, a narrative structure, a photographic technique, or a compositional approach. This limitation is not an accident or a loophole — it is a deliberate policy choice made on the theory that allowing people to own styles and ideas would balkanize creative culture and allow incumbents to tax all subsequent creativity in a genre they pioneered. Open knowledge advocates argue that creator demands for protection against AI style replication, if taken seriously, would require abandoning this foundational distinction — that protecting Sarah's style would require courts to define and enforce rights in an area that copyright law has always treated as unownable.

They are protecting access to AI tools for people who are not professional creative workers. A rarely discussed dimension of this debate is the question of who generative AI serves. For professional illustrators and authors, AI tools represent competitive threats. For students, hobbyists, people in non-English-speaking countries, people with disabilities that limit their ability to draw or write, and people who cannot afford professional creative services, these tools represent something different: unprecedented access to creative capacity. A licensing regime that makes training data expensive concentrates AI capabilities in large companies. A more open regime distributes them more widely. Open knowledge advocates argue that the creator-protection position, while emotionally sympathetic, would deliver its benefits primarily to professional creators while foreclosing access for everyone else.

What structural and labor critics are protecting

James's position starts from a different question: where did the value come from? Generative AI models exist because hundreds of millions of people produced text, images, code, and other creative work that was publicly accessible on the internet. The models are extraordinarily capable because of the scale and quality of that collective contribution. The structural critic notes that essentially none of the economic value captured by AI companies flows back to the people whose work made it possible — and argues that the current legal debate is framed too narrowly to address this. Whether fair use applies to training data is a legal question. Who captures the value of the resulting technology is a political economy question, and the two are not the same.

They are protecting the claim that creative and intellectual labor deserves a stake in the wealth it generates. The comparison that structural critics sometimes make is to early internet platforms. Millions of people produced content for Facebook, Twitter, and YouTube; those platforms became extraordinarily valuable; the people whose content made them valuable received nothing except the use of the platform itself. The structural critique of generative AI is that the same pattern is repeating at greater scale and speed: the value produced by collective human expression is being captured by a small number of companies, and the legal frameworks being contested are not designed to prevent this. The remedy that structural critics tend to favor is not copyright expansion but something more novel — collective licensing arrangements, revenue-sharing pools, or new institutions for distributing AI-generated value back to the people whose work built the systems.

They are protecting workers who are displaced rather than credited. The legal cases brought by well-known authors and illustrators have public visibility. The situation of anonymous data workers is less visible. Researchers have documented that AI training involves not only the passive ingestion of scraped data but the active labor of human annotators — often low-wage workers in the Global South — who label data, filter harmful content, and evaluate model outputs through tasks that are repetitive, poorly compensated, and psychologically damaging. The structural critique encompasses both the original creators whose work was ingested without consent and the invisible labor force that made the models safe enough to deploy. Neither group is a significant beneficiary of the wealth being created.

They are protecting a different theory of what copyright is supposed to do. Copyright maximalists argue from the premise that creators deserve control over and compensation for their work. Fair use advocates argue from the premise that social learning and transformative use should be free. The structural critic argues that both sides are arguing about individual rights in a situation that is fundamentally collective. Copyright was designed for a world where creative works were produced by individual authors and copied by other individuals or entities. Generative AI scrambles this picture: the training process is collective (drawing on millions of works), the value is collective (emerging from scale rather than any individual contribution), and the benefit should be collective too. A system designed around individual rights — whether the creator's right to control use or the developer's right to train on public data — may simply be the wrong framework for a technology built on aggregated collective output.

See also

  • Who bears the cost? — the framing essay for the broader burden-allocation question beneath this map: when generative AI draws on collective cultural labor to create enormous private value, who should absorb the losses, who should capture the gains, and what kind of compensation or redistribution counts as legitimate?
  • Who gets to decide? — the framing essay for the governance dispute beneath this map: whether AI firms can set the terms for training, licensing, and market entry on their own, or whether creators, courts, legislatures, and collective institutions get meaningful authority over how creative work is used to build commercial systems.
  • AI and Creative Work: What Each Position Is Protecting — examines the adjacent debate about what it means for AI to produce creative work: whether AI outputs count as genuine creativity, who should be credited, and what happens to professional creative industries when high-quality generation becomes cheap. This map focuses specifically on the IP and ownership dimension; the creative work map addresses the nature of creativity and professional displacement more broadly.
  • AI Governance: What Each Position Is Protecting — the intellectual property debate is one front in the broader contest over how AI development is regulated; the governance map covers the full landscape of regulatory approaches, from national competitiveness arguments against strong oversight to democratic accountability arguments for it; understanding who controls training data is inseparable from understanding who controls AI development.
  • AI and Labor: What Both Sides Are Protecting — the structural critique's concern with creative worker displacement connects to the broader debate about AI's effects on employment; this map addresses the labor dimension across sectors, while the IP map focuses specifically on the intersection of creative work, ownership, and the value generated by training data.
  • Platform Content Moderation: What Each Position Is Protecting — shares a foundational problem with the AI-IP debate: the question of what internet platforms owe to the people whose content makes them valuable; content moderation debates and AI training debates both rest on contested assumptions about who bears responsibility for how platforms use user-generated material.
  • Wealth Inequality: What Both Sides Are Protecting — the concentration of AI value in a small number of companies is a specific instance of the broader debate about how technological rents are distributed; the structural critic's argument that AI wealth should be more broadly shared echoes the general argument about whether market mechanisms produce fair distributions from technological innovation.
  • Digital Privacy and Surveillance: What Both Sides Are Protecting — data scraping for AI training and data collection for surveillance are formally distinct but share underlying questions about what consent means in a networked environment where making information publicly accessible has never historically meant consenting to all possible uses of it.

Further reading

  • Lawrence Lessig, Free Culture: How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity (Penguin Press, 2004) — the foundational text of the open culture argument; Lessig argues that copyright extension and expansion has progressively colonized the creative commons that all culture requires; essential background for the open knowledge position's critique of IP maximalism in the AI context; Lessig coined "remix culture" and built the Creative Commons licensing framework that followed.
  • Authors Guild et al. v. OpenAI, Inc., No. 23-cv-8292 (S.D.N.Y., filed 2023) — the major class action lawsuit by fiction writers (including John Grisham, Jodi Picoult, George R.R. Martin) alleging that ChatGPT was trained on their books without consent or compensation; the case raises the core questions about whether training constitutes copyright infringement and what remedy creators are entitled to; one of several coordinated legal challenges moving through US courts.
  • The New York Times Company v. Microsoft Corporation and OpenAI, No. 23-cv-11195 (S.D.N.Y., filed 2023) — the most prominent corporate plaintiff in the training data litigation; the Times argues that its journalism was used to train models that now compete directly with its core business; the case is significant because it involves a well-resourced plaintiff capable of litigating fully and because it includes evidence of near-verbatim reproduction of Times articles by GPT-4, complicating the "transformation" argument.
  • Andres Guadamuz, "The monkey selfie: copyright lessons for originality in photographs and internet jurisdiction" (Internet Policy Review, 2016) — a useful entry point into the question of non-human authorship and the legal limits of ownership claims over machine-mediated creative output; Guadamuz traces copyright law's traditional insistence on human authorship and shows why authorship fights over AI outputs are connected, but not identical, to the training data debate.
  • Kate Crawford, Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (Yale University Press, 2021) — the most comprehensive treatment of AI's political economy from the structural critique perspective; Crawford traces AI development from mineral extraction through data labor to the concentration of power in a few large companies; chapters on data and labor are directly relevant to the IP debate's invisible workforce and the question of who captures value from collective knowledge.
  • U.S. Copyright Office, Copyright and Artificial Intelligence (Parts 1-3, 2024-2025), alongside the AI study's ex parte meeting record — the Copyright Office's multi-year study is now the most useful primary-source record of the legal and policy fight as it stands; it aggregates the Office's own analysis, public comments, and formal submissions from developers and rights holders, including the major arguments about licensing, fair use, and market harm in the parties' own words.
  • Nicholas Carlini et al., "Extracting Training Data from Large Language Models" (USENIX Security, 2021) — one of the clearest empirical demonstrations that large language models can retain and reproduce training examples under certain conditions; directly relevant to the fair use debate because it complicates the strongest version of the developer argument that training is purely transformative and never functionally reproductive.
  • Cory Doctorow, "Copyright won't solve creators' Generative AI problem" (Pluralistic, 2023) — a clear statement of the structural critique from a left-libertarian perspective; Doctorow argues that the AI copyright fight can be instrumentalized by large corporations on both sides — incumbent media firms using creator sympathy to extend copyright maximalism, AI firms using fair use doctrine to consolidate power — and that neither outcome necessarily serves working creators or the public.
  • U.S. Senate Judiciary Committee, Artificial Intelligence and Intellectual Property – Part II: Copyright (hearing record, July 12, 2023) — the cleanest congressional snapshot of how creator groups, platform firms, legal academics, and lawmakers were framing the dispute as the first wave of generative-AI copyright conflict arrived in Washington; useful for seeing which arguments were already visible before the lawsuits matured.
  • Abeba Birhane et al., "The Values Encoded in Machine Learning Research" (FAccT, 2022) — a systematic study of the assumptions embedded in highly cited machine learning papers; finds that the field overwhelmingly privileges performance, novelty, and efficiency over labor, power, and downstream harm; useful context for understanding how the research culture that produced generative AI systems relates to the questions the IP debate is now surfacing.