Your AI Chatbot Is Training on Your Secrets

The Default Setting: Your Data Trains Their Models

ChatGPT, Claude, and Gemini all use your conversations for AI training by default. Unless you dig through settings and opt out, every question you ask, every document you upload, every secret you share becomes training data.

That medical question? Training data. That legal document you uploaded? Training data. That conversation about your relationship problems? Training data.

A Stanford study warns: "If you share sensitive information in a dialogue with ChatGPT, Gemini, or other frontier models, it may be collected and used for training, even if it's in a separate file that you uploaded during the conversation." [1]

The Privacy Scorecard: Who's Worst?

ChatGPT (OpenAI)

Training by default: Yes

Data retention: Indefinite

Can opt out: Yes (buried in settings)

Human review: Yes

GDPR compliant: No [2]

Claude (Anthropic)

Training by default: Yes (changed 2025)

Data retention: Up to 5 years

Can opt out: Yes

Human review: Yes (de-identified)

GDPR compliant: Partial

Gemini (Google)

Training by default: Yes

Data retention: 18 months default

Can opt out: Yes (complex)

Human review: Yes

GDPR compliant: No [3]

Copilot (Microsoft)

Training by default: Opt-in (consumer)

Data retention: Varies

Can opt out: Yes

Human review: Limited

GDPR compliant: Yes (enterprise)

ChatGPT: The Data Vacuum

OpenAI's ChatGPT is the worst offender for casual users.

What They Collect

  • Every conversation, stored indefinitely
  • Uploaded documents, images, spreadsheets
  • Your IP address and approximate location
  • Device information and browser type
  • How you use the interface (clicks, time spent)

The 2024 Policy Change

OpenAI pulled a fast one. In 2024, they removed the option for free and Plus users to disable chat history. Now all your prompts are retained indefinitely unless you manually delete them. [4]

Enterprise and Team subscribers can still opt out, with data purged after 30 days. Everyone else? Your data lives forever.

The Operator Problem

Using ChatGPT's "Operator" feature to browse the web? Screenshots and browsing activity persist for 90 days after deletion for "abuse monitoring." [4] Delete all you want, they're still watching.

November 2025: The Mixpanel Breach

On November 9, 2025, analytics company Mixpanel discovered an attacker had accessed systems containing OpenAI user data. Names, emails, analytics information, exposed. OpenAI shut down its Mixpanel integration while investigating. [5]

This wasn't the first breach. Earlier in 2025, over 225,000 OpenAI credentials appeared for sale on the dark web, stolen by infostealer malware. [6] A threat actor claimed to have 20 million more.

How to Opt Out of ChatGPT Training

  1. Open ChatGPT
  2. Click your profile icon → Settings
  3. Go to Data Controls
  4. Toggle off "Improve the model for everyone"

Warning: This only applies to new conversations. Everything you said before opting out is already in their training pipeline.

Claude: The Privacy Retreat

Anthropic marketed Claude as the privacy-conscious choice. That's no longer true.

The Quiet Policy Change

In late 2024, Anthropic changed its terms of service: conversations with Claude are now used for training by default unless you opt out. [7]

This is a retreat from Claude's earlier stance as the privacy-first AI. The company still claims to be more cautious than OpenAI, opt-out is clearer, and flagged content is de-identified, but they no longer refuse training by default.

Data Retention: 5 Years

If you don't opt out, your data can be kept for up to five years. Deleted chats aren't used, but anything from before you changed settings might still be in training datasets. [7]

The Enterprise Exception

Like every AI company, Anthropic treats paying enterprise customers differently. API users and business accounts are shielded from training use. Only consumers get exploited by default.

How to Opt Out of Claude Training

  1. Go to claude.ai
  2. Click your name → Settings
  3. Find Privacy section
  4. Toggle off training data usage

Gemini: Google's Data Integration Machine

Google's Gemini has the most complex, and invasive, data practices.

The 18-Month Default

Google stores your Gemini conversations for 18 months by default. You can change this to 3 or 36 months, or turn it off entirely in Activity controls. [8]

But here's the catch: even with activity turned off, conversations are still stored for 72 hours. Reviewed chats are retained for up to three years. [8]

Human Reviewers See Your Chats

A subset of your conversations gets reviewed by actual humans at Google. They're supposed to assess if responses were "low-quality, inaccurate, or harmful." [3] In practice, this means Google employees reading your private questions.

July 2025: The App Access Expansion

Starting July 7, 2025, Gemini gained access to Phone and Messages apps, even if you have "Gemini Apps Activity" turned off. [9]

Google claims turning off activity still prevents training use, and data is deleted after 72 hours. But an AI with access to your call logs and private messages? That's a lot of trust to place in Google's "internal practices."

November 2025: The Gmail Panic

Rumors spread that Google was using Gmail data to train Gemini. Google called reports "misleading", the confusion came from a January 2025 update that split one settings toggle into two. Some users found settings had flipped back on. [10]

Whether that was a bug or dark pattern, users learned their email might be feeding the machine.

How to Limit Gemini Data Collection

  1. Go to myactivity.google.com
  2. Click Gemini Apps Activity
  3. Toggle it off (or set shorter retention)
  4. Delete existing activity

Note: This doesn't stop 72-hour retention or human review of flagged chats.

Microsoft Copilot: The Complicated One

Microsoft's Copilot has the most privacy-friendly defaults, for consumers.

Consumer Copilot

Training is opt-in, not opt-out. By default, chats are only used for "essential purposes" like bug fixes and abuse prevention. If you consent to training, personal identifiers are removed first. [11]

Your uploaded files are never used for training, regardless of settings. [11]

Microsoft 365 Copilot (Enterprise)

Enterprise users get the strongest protections:

  • Data is encrypted and never used to train foundation models
  • Prompts and responses aren't used for third-party training
  • Complies with GDPR, EU Data Boundary, and ISO/IEC 27018 [12]

The catch: you're paying Microsoft a premium for privacy that should be the default for everyone.

The Breach Record

AI chatbot security isn't theoretical. Here's what's already gone wrong:

OmniGPT (2025)

Hacker claimed to have breached the AI platform, exposing:

  • 30,000 users' personal data
  • 34 million lines of conversation logs
  • Uploaded files with credentials and API keys [6]

DeepSeek (January 2025)

Chinese AI chatbot suffered multiple attacks:

  • DDoS attack halted new registrations
  • Exposed internal database to public internet
  • Open ClickHouse instance caused massive leak [13]

ChatGPT Indexed by Google (2025)

Thousands of ChatGPT conversations became searchable on Google due to misconfigured noindex tags on share-link pages. [6]

Your "private" conversation might be one Google search away.

Samsung Leak (2023)

Samsung employees accidentally leaked confidential code and documents by pasting them into ChatGPT. Samsung banned generative AI tools company-wide. [6]

The Consumer vs. Enterprise Divide

Notice a pattern? Every AI company offers privacy, if you pay enterprise rates.

Consumer services operate under non-negotiable terms of service. Your data is the product. Enterprise platforms have legally binding Data Processing Addendums (DPAs). They sell privacy itself as the product. [7]

Same technology. Same company. Different rules based on how much you pay.

What They Won't Tell You

Training Data Is Forever

Even if you delete a conversation, the model weights derived from it persist. Your words become part of the AI itself. There's no "unlearning" your data once it's baked into the model.

Third-Party Sharing

OpenAI allows "authorized vendors" access to user data. User data can be shared with law enforcement or government agencies if required. [4] That medical question you asked might end up in a legal discovery request.

Inference Attacks

Stanford researchers warn about "inference attacks", extracting private information from AI models trained on user data. Even if your specific conversation isn't stored, the model might reveal patterns from your data when queried by others. [1]

How to Protect Yourself

Immediate Actions

Opt Out of Training (All Platforms)

  • ChatGPT: Settings → Data Controls → Disable "Improve the model"
  • Claude: Settings → Privacy → Disable training
  • Gemini: myactivity.google.com → Gemini Apps Activity → Off
  • Copilot: Already opt-in (but verify in settings)

Delete Existing Data

  • ChatGPT: Settings → Data Controls → Delete all chats
  • Claude: Delete conversation history in settings
  • Gemini: myactivity.google.com → Delete activity

Use Anonymous Access When Possible

  • DuckDuckGo's AI Chat (no account required, no logging)
  • Perplexity with incognito mode
  • Self-hosted models (Ollama, LM Studio)

Behavioral Changes

  • Never share sensitive information: Medical records, legal documents, passwords, personal identifiers
  • Assume everything is logged: Even "deleted" data may persist
  • Anonymize before sharing: Remove names, dates, identifying details
  • Use separate accounts: Don't link AI accounts to your primary email
  • Check settings regularly: Companies change policies without notice

For Sensitive Work

  • Run local models: Ollama, LM Studio, PrivateGPT: never leaves your machine
  • Enterprise accounts: If your employer pays, use their protected instance
  • Avoid uploads: Type information manually instead of uploading documents
  • VPN + fresh account: For maximum privacy, use a VPN and account not linked to your identity

The Uncomfortable Truth

Convenience vs. Privacy

Every AI assistant that "knows you" does so by collecting data about you. Every "personalized" response comes from surveillance. Every "improvement" to the model might include your private conversations.

The business model is simple: you get a free or cheap AI assistant. They get your data to train the next generation of AI, which they sell to enterprise customers who pay for privacy.

You're not the customer. You're the training data.

References

  1. Stanford Report - Study exposes privacy risks of AI chatbot conversations (October 2025)
  2. Chatbase - Does ChatGPT Save Your Data? 2025 Privacy Guide
  3. Heydata - Google Gemini: Is your Data Safe?
  4. Nightfall AI - Does ChatGPT Store Your Data in 2025?
  5. Windows Central - OpenAI confirms major data breach (November 2025)
  6. Wald.ai - ChatGPT Data Leaks and Security Incidents (2023-2025)
  7. Medium - The Great AI Privacy Divide: Claude, ChatGPT, Gemini, and Copilot in 2025
  8. Google - Gemini Apps Privacy Hub
  9. Android Headlines - Why Google Gemini AI's Latest Move May Be a Privacy Red Flag (June 2025)
  10. TwinStrata - Google Denies Gmail Data Use Amid Privacy Fears 2025
  11. Microsoft Support - Privacy FAQ for Microsoft Copilot
  12. Microsoft Learn - Data, Privacy, and Security for Microsoft 365 Copilot
  13. CM Alliance - DeepSeek Cyber Attack: Timeline, Impact, and Lessons Learned