TL;DR: AI companies scraped the entire internet, including your content, to train their models. They're now worth hundreds of billions. Courts are deciding whether this is "fair use" or theft. The New York Times is suing OpenAI. Getty Images is suing Stability AI. Anthropic settled with authors for $1.5 billion. Meanwhile, Disney invested in OpenAI and licensed its characters. 2026 will see landmark rulings that determine whether AI companies must compensate creators, or whether everything you post online is free training data.
What Happened
AI companies built their fortunes on your content:[1]
- Web scraping: Bots crawled billions of web pages, copying text, images, and code
- Books3 dataset: 196,000+ pirated books used to train language models
- LAION-5B: 5.8 billion image-text pairs scraped from the web
- GitHub Copilot: Trained on public code repositories, including copyleft licensed code
- ChatGPT: Trained on Common Crawl data covering much of the web
Nobody asked permission. Nobody paid. The companies argue this is "transformative fair use."
Major Lawsuits
NYT vs OpenAI
The New York Times alleges ChatGPT can reproduce its articles verbatim. OpenAI claims fair use. Trial expected in 2026.
Getty vs Stability AI
Getty Images claims Stable Diffusion copied millions of images. Stability argues transformative use.
Authors Guild vs OpenAI
Authors including John Grisham, George R.R. Martin, and Jodi Picoult allege "mass copyright infringement."
Music Publishers vs AI
Warner Music Group settled with AI music startups Suno and Udio. Licensed models coming in 2026.
The Licensing Pivot
Some companies are now paying, after getting sued:[2]
- Anthropic: $1.5 billion settlement with authors
- OpenAI + Reddit: Licensing deal (undisclosed value)
- OpenAI + Associated Press: Licensing deal
- Disney + OpenAI: Disney invested and licensed characters for Sora video generator
- Warner Music + Suno/Udio: Settlement and licensing agreement
The pattern: companies with resources can negotiate licenses. Individuals, small creators, and the public domain get nothing.
The Fair Use Debate
Fair use is the legal doctrine allowing limited use of copyrighted material without permission. Courts consider:[3]
- Purpose and character: Is the use transformative? Commercial or educational?
- Nature of the work: Is the original creative or factual?
- Amount used: How much of the original was copied?
- Market effect: Does the use harm the original's market?
AI companies argue: Training is transformative. The models don't reproduce originals: they create new content. This is like how humans learn.
Rights holders argue: The entire works were copied. The models compete with originals. This is industrial-scale infringement.
Some judges have called AI training "quintessentially transformative." Others worry it could "undermine creative industries." The cases will likely reach the Supreme Court.
The Privacy Angle
Beyond copyright, there's a privacy problem:
- Personal data in training: AI models contain information about real people scraped from the web
- No deletion possible: Once trained into a model, personal information can't be removed
- Memorization: Models can sometimes reproduce exact training data, including personal details
- Gmail lawsuit: A class action alleges Google used private Gmail content to train AI without consent
GDPR gives Europeans a right to deletion, but that right may be meaningless for data baked into AI models.
Is Scraping Legal?
Web scraping occupies a gray zone:[4]
- Public data: Generally legal to scrape publicly accessible content
- Terms of service: Violating website TOS may create liability
- Technical circumvention: Bypassing access controls may violate computer fraud laws
- robots.txt: Ignoring robots.txt is bad faith but not clearly illegal
Lawsuits are testing these boundaries. Reddit sued Perplexity AI for allegedly violating scraping policies. Google sued SerpApi. Outcomes will shape what's permissible.
Emerging Regulations
States are beginning to regulate AI:[5]
- Texas TRAIGA (Jan 2026): Bans certain AI uses, requires disclosures for AI in government and healthcare
- Colorado AI Act (June 2026): Prevents algorithmic discrimination by "high-risk" AI systems
- EU AI Act: Requires transparency about training data for high-risk AI systems
No comprehensive federal AI law exists in the US. Copyright law wasn't designed for this. Courts are improvising.
What Content Creators Can Do
Check robots.txt
Add directives blocking AI crawlers (GPTBot, CCBot, etc.). It's not legally binding but creates evidence of intent.
Watermark Images
Watermarks make scraped images less usable. Not foolproof but raises friction.
Monitor for Copying
Tools exist to detect if your content appears in AI outputs. Document instances for potential claims.
Join Collective Action
Authors Guild, music guilds, and other organizations are filing collective suits. Strength in numbers.
Consider Licensing
Some platforms enable opt-in licensing (Shutterstock, Adobe). Compensation is minimal but exists.
Document Everything
Keep records of when you published what. You may need to prove you created something before a model was trained.
Platform Policies
Where your content lives matters:
- X (Twitter): Policy allows training AI on posts (including yours)
- Reddit: Licensing deal with OpenAI covers user content
- Meta: Using public Instagram and Facebook content for AI training
- DeviantArt: Opt-out available after backlash
- Tumblr: WordPress explored AI licensing deals with user content
Read terms of service, but understand that platforms can change them. You agreed to terms that probably didn't exist when you signed up.
The Bottom Line
If you've posted anything online (a blog, photos, code, comments, reviews) AI companies probably copied it. They used your work to build products worth billions. They didn't ask. They didn't pay. They argue it's legal.
Courts will decide in 2026 and beyond. The New York Times case, Getty case, and author lawsuits could reshape the AI industry. If fair use claims fail, companies may need to license or remove training data. If fair use claims succeed, everything online is fodder.
Meanwhile, the biggest creators are getting licensing deals. Disney gets paid. Warner Music gets paid. Individual creators get nothing, except their work used to train systems that may replace them.
This is the largest intellectual property dispute in history. The outcome affects every person who creates anything on the internet.