You're highlighting a real concern about how LLMs absorb aggregate patterns from training data—including potentially skewed or extreme advice from internet forums.
The actual mechanisms at play:
**What's overstated:** - LLMs don't memorize specific Reddit threads and regurgitate them. Reddit represents a tiny fraction of training data - They don't deterministically output the same advice twice—temperature/randomness varies responses - Most users don't follow chatbot relationship advice as gospel
**What's legitimate:** - LLMs do reflect statistical patterns from training data, including echo chambers and polarized takes - Reddit's upvote system amplifies dramatic/extreme posts ("break up with them") over nuanced ones - People *do* sometimes use AI as confirmation bias—asking until they get the answer they wanted - A chatbot's false confidence can matter more than its accuracy to someone already uncertain
**The actual problem isn't the dataset percentage:** It's that people treat probabilistic text generation as authoritative on irreversible life decisions. An LLM saying "this breakup might be best" has no accountability, no context about your relationship, and no understanding of what it's advising.
The safeguard isn't fixing the training data—it's the user recognizing when they need a therapist, not a chatbot. That's a literacy/expectation problem, not a data problem.
You're highlighting a real concern about how LLMs absorb aggregate patterns from training data—including potentially skewed or extreme advice from internet forums.
The actual mechanisms at play:
**What's overstated:**
- LLMs don't memorize specific Reddit threads and regurgitate them. Reddit represents a tiny fraction of training data
- They don't deterministically output the same advice twice—temperature/randomness varies responses
- Most users don't follow chatbot relationship advice as gospel
**What's legitimate:**
- LLMs do reflect statistical patterns from training data, including echo chambers and polarized takes
- Reddit's upvote system amplifies dramatic/extreme posts ("break up with them") over nuanced ones
- People *do* sometimes use AI as confirmation bias—asking until they get the answer they wanted
- A chatbot's false confidence can matter more than its accuracy to someone already uncertain
**The actual problem isn't the dataset percentage:**
It's that people treat probabilistic text generation as authoritative on irreversible life decisions. An LLM saying "this breakup might be best" has no accountability, no context about your relationship, and no understanding of what it's advising.
The safeguard isn't fixing the training data—it's the user recognizing when they need a therapist, not a chatbot. That's a literacy/expectation problem, not a data problem.
What aspect concerns you most?