Your AI Assistant Knows Too Much | State of Surveillance

You're Training Their Models for Free

Remember that contract you pasted into ChatGPT for a quick review? That code you had GitHub Copilot debug? The personal story you shared with Claude?

It's all training data now.

OpenAI's terms are clear: they keep your conversations for 30 days minimum. They use them to "improve" their models. Translation: your private thoughts become part of GPT-5.

Microsoft's Copilot? It learned to code by scraping every public GitHub repository. Including yours. That open-source project you thought was helping the community? Microsoft monetized it without asking.

Samsung Engineers Learned the Hard Way

April 2023. Samsung semiconductor engineers paste confidential source code into ChatGPT. Three separate incidents in less than a month:

Engineer #1: Uploads proprietary code to fix a bug
Engineer #2: Shares internal meeting notes for summarization
Engineer #3: Pastes confidential chip designs for optimization

Samsung banned ChatGPT company-wide within weeks. Too late. That code is now part of OpenAI's training data forever. You can't delete it. You can't take it back.

But Samsung employees didn't stop using AI. They just switched to personal devices. IT departments call it "Shadow AI" - and 75% of knowledge workers admit to using unauthorized AI tools at work.

AI Has a Perfect Memory (Of Your Mistakes)

Researchers at Google proved it in 2024: Large language models memorize training data verbatim. Feed them the right prompt, they'll spit out:

Phone numbers from random web pages
Email addresses from forums
Bitcoin private keys (yes, really)
Medical records from leaked databases

That angry email you wrote in 2015 and posted on Reddit? If it was public when they scraped the web, it's in there. Forever.

GPT-3 was trained on 45TB of text. That's roughly 500 billion words. Your words, if you ever posted anything online before 2021.

Who's Actually Reading Your AI Chats

OpenAI (ChatGPT)

Human contractors review your conversations to "improve safety." They see everything unless you explicitly opt out. Even then, they keep chats for 30 days "for abuse monitoring."

Found the opt-out? Good luck. It's buried three menus deep: Settings → Data Controls → Improve model for everyone → OFF.

Google (Bard/Gemini)

Keeps your conversations for 18 months. Even if you delete them from your account. Human reviewers read "a subset" - they won't say how much.

The kicker: Bard links to your Google account. Your searches, emails, docs - all connected. One profile to rule them all.

Microsoft (Copilot)

Scans your Office documents. Reads your Teams messages. Analyzes your Outlook emails. All to "provide better suggestions."

Enterprise customers get "privacy assurances." Translation: Microsoft promises not to look... unless they need to.

Anthropic (Claude)

Claims to be "privacy-focused." Still trains on your conversations unless you email them to opt out. Email. In 2025.

What AI Can Figure Out About You

MIT researchers showed in 2024 that ChatGPT can infer:

Your age from writing style (87% accuracy)
Your location from casual mentions (73% accuracy)
Your income bracket from topic choices (69% accuracy)
Mental health conditions from conversation patterns
Political affiliation from question framing

You never told it any of this. It figured it out.

Worse: AI can now generate fake profiles of you. Give it a few data points, it creates your "digital twin" - what you'd say, how you'd act, what you'd buy. Marketers love it. Scammers love it more.

Your Company's Secrets Are Leaking

JP Morgan restricts ChatGPT use. Amazon warns employees not to share code. Apple bans it entirely. They know something you don't?

Every major consulting firm now has an "AI leak incident" team. McKinsey found client data in GPT outputs. Deloitte discovered strategy docs in Bard responses. PwC caught employees uploading audit files.

The average data breach costs $4.45 million. An AI training data leak? Permanent. Unfixable. Priceless to competitors.

The Law Can't Save You

GDPR has a "right to be forgotten." Doesn't apply to AI training data - it's "transformed" and "aggregated."

California's CCPA lets you delete personal data. AI companies claim training data isn't "personal" anymore - it's "statistical."

Illinois BIPA protects biometric data. AI companies don't store your face - just the "mathematical representation" of it.

Every law has a loophole. Every loophole has a lawyer. Every lawyer works for Big Tech.

Damage Control (What Little You Can Do)

Stop Feeding the Machine

Never paste sensitive documents into any AI
Use fake names, dates, and details when possible
Create separate accounts with minimal information
Assume everything is public and permanent

Use Local AI When Possible

Ollama - Run models on your own machine
LM Studio - Local LLMs without cloud
PrivateGPT - Your data never leaves your device
GPT4All - Open source, offline capable

Opt Out Everywhere

ChatGPT: Settings → Data Controls → Turn off training
Google: myactivity.google.com → Delete everything
Microsoft: privacy.microsoft.com → Clear all data
Meta: Settings → Generative AI → Object to everything

Poison the Well

Can't beat them? Confuse them. Add false information to your profiles. Use different birthdates. Scramble your interests. Make your data worthless.

Tools like TrackMeNot generate fake searches. AdNauseam clicks every ad. They create noise in your data shadow.

It Gets Worse From Here

GPT-5 is training now. On your conversations from the last two years.

Apple's putting AI in your iPhone. Reading your texts. Analyzing your photos. "On device" they say. Until it's not.

Your car has AI now. Listening to your conversations. Tracking your destinations. Ford, GM, Tesla - all selling your driving data to insurance companies.

Smart TVs transcribe what you watch. Alexa records what you say. Your doorbell films who visits. All training AI. All building profiles. All for sale.

The Uncomfortable Truth

You already lost this privacy battle. If you've used the internet in the last decade, you're in the training data. Your emails, posts, comments, photos - all of it.

The question isn't whether AI knows about you. It's what it does with that knowledge.

Today, it sells you products. Tomorrow, it denies you insurance. Next year, it decides if you get that job.

The AI that promises to help you is building a permanent record of everything you've ever shared. Every question reveals something. Every prompt leaves a trace.

They call it "artificial intelligence." But it's trained on human secrets. Your secrets.

And it never forgets.

⚠️ Remember

Every AI interaction is a privacy transaction. You trade your data for convenience. Make sure you understand the price.