Do Androids Dream of Electric Sheep?: Why LLMs Hallucinate

Welcome back, digital detectives and legal futurists! Today we’re venturing into Philip K. Dick territory, where AI hallucinations blur fact and fiction just like Blade Runner‘s replicants couldn’t distinguish implanted memories from reality. When ChatGPT confidently cites non-existent cases with the authority of a Supreme Court Justice, we face a professional dilemma that goes beyond mere technical glitches.
I’ve dug into Anthropic’s groundbreaking March 2025 research to uncover the neural “circuit breakers” behind these AI confabulations. Pour yourself something contemplative and join me as we explore how to administer our own Voight-Kampff test to the AI tools rapidly becoming part of our professional DNA, because these systems aren’t lying to us; they’re simply dreaming of electric sheep and knowledge they don’t possess. Let’s dive in!
This substack, LawDroid Manifesto, is here to keep you in the loop about the intersection of AI and the law. Please share this article with your friends and colleagues and remember to tell me what you think in the comments below.
In Philip K. Dick’s seminal novel that inspired Blade Runner (one of my all time favorite movies, just ask my wife), the central question revolves around what separates humans from machines. The replicants, artificial beings nearly indistinguishable from humans, are identified through the Voight-Kampff test, which measures emotional responses. Today, we face a similar challenge with Large Language Models (LLMs): distinguishing fact from fiction in their seemingly confident outputs.
But unlike Dick’s replicants, who struggled to differentiate implanted memories from real ones, LLMs don’t “remember” anything at all. Yet they still manage to confabulate with remarkable conviction, a phenomenon we call “hallucination.” In this article, I explore beyond the pat understanding we have of LLM hallucinations and grapple with why they occur.
If this sounds interesting to you, please read on…
The Ghost in the Machine
When ChatGPT tells you with unwavering certainty that Michael Batkin is a professional pickleball player or that Andrej Karpathy authored “ImageNet Classification with Deep Convolutional Neural Networks” (he didn’t), it isn’t lying. It’s hallucinating, generating text that sounds plausible but has no basis in fact.
Imagine being asked to complete this sentence: “The capital of France is ______.” Easy, right? Now try: “The capital of Zorblaxia is ______.” Unless you’re extraordinarily well-versed in fictional geography, you’ll likely decline to answer.
But what if you were trained from birth to always provide completions to sentences, no matter what? You might say something like “Zorbopolis” with a confident smile, despite having invented it on the spot.
This is essentially what happens with LLMs. They’re trained to predict the next word in a sequence, and they’ll do so even when they venture beyond their knowledge frontier. It’s less a deliberate deception and more like a cognitive reflex, a predictive muscle that doesn’t know when to stop flexing.
The Epistemological Circuit Breakers
Recent research from Anthropic offers fascinating insights into the neural circuitry behind this phenomenon. Their March 2025 paper, “On the Biology of a Large Language Model,” reveals that these systems contain what could be described as epistemological circuit breakers, mechanisms designed to prevent the model from answering when it lacks confidence.
The researchers identify three key circuit components:
-
“Can’t answer” features – These activate by default in response to questions, functioning as a sort of epistemic presumption of ignorance
-
“Unknown name” features – These fire when the model encounters unfamiliar entities
-
“Known answer” features – These inhibit the above two features when the model recognizes a familiar topic
When functioning correctly, this system creates a fascinating parallel to legal standards of proof. Just as a case needs sufficient evidence to overcome a presumption of innocence, an LLM needs sufficient “knowledge activation” to overcome its default stance of epistemic humility.
When The Circuit Breakers Fail
So why do hallucinations occur? The research suggests it’s often a case of mistaken identity, or what legal minds might recognize as a type of false positive.
When asked about Andrej Karpathy’s papers, the model recognizes Karpathy as a known entity in AI research. This recognition partially activates “known answer” features, which in turn suppress the “can’t answer” circuit. The model then proceeds to generate a plausible but incorrect response, drawing on its general knowledge of Karpathy’s field rather than specific knowledge of his publications.
It’s reminiscent of how eyewitness testimony can be compromised by familiarity bias. Just as a witness might falsely place a familiar person at a crime scene because the context seems plausible, an LLM might attribute a well-known paper to a well-known researcher because the association seems reasonable.
The Voight-Kampff Test for AI
In the Blade Runner universe, the Voight-Kampff test measured empathy through physiological responses to emotionally provocative questions. For LLMs, we need our own version, a test that reveals when a model is operating beyond its knowledge boundaries.
The researchers at Anthropic demonstrate how they can manipulate these internal circuits to either induce hallucinations in naturally cautious models or enforce epistemic humility in naturally overconfident ones. By artificially activating “known answer” features when asking about fictitious Josh Batson’s papers, they can make the model hallucinate publications that don’t exist. Conversely, by inhibiting these same features when asking about Karpathy, they can make the model appropriately decline to answer.
This manipulation reveals something profound about these systems: their confidence is not a reflection of knowledge but rather the activation pattern of specific neural circuits, circuits that can be triggered inappropriately by contextual similarity or association.
Electric Sheep and Digital Falsehoods
Why should lawyers care about the mechanics of AI hallucination? In a world increasingly mediated by these systems, understanding their limitations becomes a matter of professional necessity.
Consider how hallucinations might impact legal research, contract analysis, or precedent review. An LLM might confidently cite a non-existent case that sounds plausible, or attribute principles to statutes where they don’t exist, all while maintaining the authoritative tone of a seasoned legal scholar.
The philosopher John Searle’s Chinese Room thought experiment argued that computational systems manipulate symbols without understanding meaning. LLMs exemplify this in their hallucinations, they generate text that syntactically resembles factual statements but lacks the semantic grounding in reality that human knowledge possesses.
As Jorge Luis Borges might remind us, another master of blurring reality and fiction, the map is not the territory. Or, Jacques Derrida, through his concept of the différance, would recognize in LLMs the perpetual play of signifiers detached from their signified, where meaning is always deferred and dispersed across an endless web of textual relations, never fully present or accessible. The language model’s representation of knowledge is not knowledge itself, but rather a statistical shadow of human-written text, sometimes faithfully reflecting the world and other times conjuring mirages that vanish upon closer inspection.
From Electric Sheep to Reliable Partners
Understanding the mechanisms behind LLM hallucinations isn’t merely academic: it’s practical. As these systems become increasingly embedded in professional workflows, distinguishing their truths from their fictions becomes essential.
This challenge extends beyond simple fact-checking. The insidious nature of AI hallucinations lies in their plausibility, they often contain just enough truth to pass cursory inspection while harboring subtle fictions that could undermine entire legal arguments. It’s the digital equivalent of perjury committed not with malice, but with a fundamental inability to distinguish memory from imagination.
The good news is that research like Anthropic’s suggests a path forward. By better understanding the internal circuitry governing these models’ epistemological guardrails, we can potentially engineer systems that more reliably recognize and communicate their own limitations. What if an AI could not only decline to answer when uncertain but could articulate precisely why it lacks confidence? Imagine a legal research assistant that could say, “I recognize the case name you’ve mentioned, but I’m not confident about the specific holding you’re asking about, here’s why my uncertainty is high on this particular question.”
Some promising approaches include:
-
Uncertainty Quantification – Training models to explicitly model their confidence in different aspects of their responses, allowing them to flag potentially hallucinated content
-
Knowledge Tracing – Implementing systems that track precisely which training examples influence specific outputs, creating a kind of “citation trail” for AI-generated content
-
Adversarial Testing – Developing specialized tools that deliberately probe AI systems for hallucinations, identifying weak points in their knowledge representation
-
Human-AI Collaboration Frameworks – Designing interfaces and workflows that leverage the complementary strengths of human discernment and AI processing power
The challenge of AI hallucinations represents not a reason to reject these powerful tools, but an opportunity to develop a new kind of professional expertise: the ability to work effectively with systems that reason differently than humans do.
Closing Thoughts
Hallucinating LLM offers us a perfect metaphor for our post-truth era, confident assertions untethered from reality, yet delivered with the polished authority that makes them difficult to dismiss.
For us in the legal profession, the stakes couldn’t be higher. The same systems that promise to democratize legal knowledge could also poison the well with plausible-sounding fabrications. The lawyer of tomorrow must become something of an AI epistemologist, able to recognize when these digital oracles are prophesying truth and when they’re merely generating convincing illusions.
This isn’t just about fact-checking technology, it’s about developing a new kind of literacy. Just as we’ve learned to recognize the telltale signs of human deception, we must now train ourselves to spot the unique fingerprints of AI confabulation. The legal mind, already disciplined in evaluating evidence and questioning assertions, is perhaps uniquely positioned for this challenge.
The future belongs not to those who blindly embrace these technologies nor to those who fearfully reject them, but to those who approach them with a clear-eyed understanding of both their revolutionary potential and their fundamental limitations. Like the blade runners of fiction, we must develop our own tests to separate the real from the artificial, knowledge from its statistical shadow.
Until then, keep your digital skepticism sharp and your curiosity sharper!
This article is the first in a series on Machine Thinking, where I explore different aspects of how large language models “think.”
Many thanks to Nikki Shaver for the inspiration on this subject and to Anthropic’s research and paper “On the Biology of a Large Language Model” (March 2025).
By the way, as a LawDroid Manifesto premium subscriber, you would get access to exclusive toolkits, like the Missing Manual: OpenAI Operator, coming out this month…
With these premium toolkits, you not only learn about the latest AI innovations and news items, but you get the playbook for how to use them to your advantage.
If you want to be at the front of the line to get first access to helpful guides like this, and have the inside track to use AI as a force multiplier in your work, upgrade to become a premium LawDroid Manifesto subscriber today!
I look forward to seeing you on the inside.
Cheers,
Tom Martin
CEO and Founder, LawDroid