LLMs make stuff up.
LLMs can confidently tell you all about the winners of non-existent sporting events1, They can invent legal cases2, and fabricate fake science3. These kinds of outputs are often called hallucinations.
Thankfully, the greatest minds of the human race are hard at work ironing out this problem. Solutions abound: improving the quality of your training data4, testing the system diligently5, fine-tuning the model with human feedback6, integrating external data sources7, or even just asking the LLM to evaluate its own answer8 are just some of the ingenious techniques engineers and scientists have proposed.
According to OpenAI, GPT-4 reduces hallucination by 40% compared to its predecessors: and surely, LLM proponents say, that pattern will continue, Moore’s-Law-style, until LLM hallucination is a distant memory, like the inefficiencies in early steam engines.
But is it so simple? A growing chorus of academics, engineers and journalists are calling this narrative into question.9 Maybe hallucination isn’t a solvable problem at all.
Indeed, it’s increasingly clear that the word ‘hallucination’ fundamentally distorts the way LLMs actually work. This means our judgements of what LLMs are capable of could be systematically off the mark.
We need a new word. We need a word which captures how LLMs behave, but doesn’t bake in false assumptions about how LLMs work. We need a word which enables accurate and honest discussions about how best to apply LLMs.
But before we get to the solution, we need to properly understand the problem.
What is ‘hallucination’?
Since about 2017, researchers in natural language processing have used the word ‘hallucination’ to describe a variety of behaviours10. Since ChatGPT propelled LLMs to public awareness, this word has become commonplace. But what does it actually mean?
Sometimes, it refers to unfaithfulness, a jargon word in natural language processing. This means that its output contains information which is not supported by the input.
For example, if you try to get an LLM to summarise a meeting transcript, and its summary includes events which weren’t described in the transcript, the LLM has been ‘unfaithful’.
When academics talk about ‘hallucination’, they most often mean something like ‘unfaithfulness’ – though other meanings are common, too.
Sometimes, ‘hallucination’ refers to non-factuality: producing outputs which contain falsehoods, or other falsehood-like things, such as unjustifiable claims, or claims which contradict the LLM’s training data. This is probably the most common usage in non-academic circles.
By the way, sometimes, ‘hallucination’ has another meaning altogether: nonsense. Nonsense is one of those things which is hard to define, but you know it when you see it. This isn’t my focus for this article, so let’s leave nonsense aside for proper treatment another day.
If you want to go deep on how people are using the word ‘hallucination’ to talk about LLMs, I’ve investigated this in detail on my personal blog.
So what people mean by ‘hallucination’ is pretty diverse. But all of these different usages have something in common: ‘hallucination’ is regarded as abnormal behaviour.
‘Hallucination’ as abnormal behaviour: what this means
Treating hallucination as abnormal is very significant. Let’s take a minute to consider what this entails.
First of all, it entails that every hallucination can be explained, at least in principle.
This is because, when the environment and inputs are normal, systems behave normally. So, whenever an LLM hallucinates, we should expect to find something abnormal about the environment or the input which explains why the LLM hallucinated.
Secondly, if hallucination is abnormal behaviour, that implies that not hallucinating is normal.
That in turn means there must be some mechanism, some causal explanation contained in how LLMs work, which explains why, under normal conditions, LLMs don’t hallucinate.
These points are pretty subtle, so let’s explore a couple of examples.
Think about dice. I could roll six sixes in a row, and that would be really unusual behaviour for the die.
But firstly, this wouldn’t demand an explanation. It’s not a bug, it’s just statistically improbable.
And secondly, there’s no mechanism which can explain why, under normal conditions, a die shouldn’t roll six sixes in a row. Very, very occasionally, it’s just what dice do.
That’s because it’s normal (albeit uncommon) behaviour for a die to roll six sixes in a row.
In contrast, if I secure a picture to the wall with a picture hook, and it falls off, I can reasonably expect there to be an explanation.
Maybe the nail was bent. Maybe I didn’t hammer it in straight. Maybe I didn’t quite catch the string on the hook when I put it up. Point is, something must have gone wrong for the picture hook to behave in this way.
And again, we should expect there to be a mechanism which can explain why under normal conditions, pictures hooked to the wall don’t fall off.
And indeed there is such a mechanism: the hook catches the string and transfers the weight of the picture to the wall, instead of leaving that weight to pull the picture down towards the ground.
That’s because for the picture to fall off the wall is abnormal behaviour for a picture hook.
So, to recap, if LLM hallucination is abnormal behaviour, we should expect two things:
- There should be some abnormal conditions which explain why LLMs hallucinate in some cases but not others.
- There should be some mechanism which explains why, under normal conditions, LLMs don’t hallucinate.
Why the word ‘hallucination’ distorts how LLMs work
The word ‘hallucination’ doesn’t satisfy either of those two expectations listed above. Therefore, the word ‘hallucination’ fundamentally distorts how LLMs really work.
Firstly, when an LLM hallucinates, we should expect there to be some abnormal conditions which explain why it did so. But often, there is no such explanation.
Occasionally, LLMs produce unfaithful and untrue content for no good reason. Often, the problem goes away by making meaningless tweaks in the wording of the prompt, or even simply by running the algorithm again.
Secondly, if hallucination is abnormal behaviour, we should expect there to be some mechanism which causes LLMs not to hallucinate under normal conditions. But there is no such mechanism.
LLMs are predictive text machines, not oracles. They are not trained to target faithfulness. They are not trained to target truth. They are trained to guess the most plausible next word in the given sequence. So the case where the LLM ‘hallucinates’ and the case where it doesn’t are causally indistinguishable.
There were some important and difficult points there, so let’s elaborate.
By comparison, for humans to produce false or unfaithful data would indeed be abnormal behaviour for humans.
But there are explanations for why humans sometimes produce such outputs: maybe they were lying, or they were playing a game, or they were joking. Or indeed, perhaps they ‘hallucinating’ in the traditional sense!
In contrast, an LLM can never literally lie, play a game, or tell a joke. It’s a pile of matrices.
And there are mechanisms which explain why humans normally don’t produce unfaithful or unfactual content. We have mental models of the external world. We continually amend those models in response to evidence. We receive that evidence through perception. And we use language, among other things, as a means of communicating our mental models with other people.
But an LLM doesn’t have any of these mechanisms. It doesn’t have a model of the external world. It doesn’t respond to evidence. It doesn’t gather evidence through perception (or, indeed, by any other means). And it doesn’t attempt to use language as a means of communicating information with other minds.
So the concept of ‘hallucination’ fundamentally mischaracterises the nature of LLMs by assuming that false or unfaithful outputs are abnormal for LLMs. In fact, such outputs, while rare (at least in recent models), are nonetheless normal, like a dice rolling six sixes in a row.
Why this matters
OK, so ‘hallucination’ is not a great word. It assumes that unfaithful or unfactual output is abnormal output for an LLM, when in fact, this just isn’t true.
But why does this matter? Can’t we afford a little fudge?
I think the use of this word has serious consequences.
In academia, researchers are spilling considerable amounts of ink trying to ascertain the causes of ‘hallucination’11.
Not only is this enterprise wholly in vain, but it risks spreading the misperception that hallucination is incidental, and will disappear once we find better ways of designing LLMs. That in turn could lead decision-makers to invest in LLM applications that just won’t work, under the false impression that the ‘hallucination’ problem will disappear in future.
This isn’t a hypothetical concern. LLMs have been set to writing entire travel articles, making legal submissions, and providing customers with information on company policies with catastrophic results.
And think about some of these common or proposed use cases:
- Helping write submissions to academic journals
- Generating meeting minutes
- ‘Chatting with data’
- Summarising non-structured data in social research
- Writing computer code
- Question-answering
- All-purpose ‘AI assistants’
Given that LLMs will, rarely, but inevitably, output garbage, all of these will need to be done with enormous care to avoid causing serious damage.
Furthermore, there’s a risk that an LLM may act as a so-called ‘moral crumple zone’, effectively soaking up responsibility for failures on behalf of the humans who were really responsible. And where there’s a lack of accountability, organisations are unable to make good decisions or learn from their mistakes12.
How to understand LLM behaviour besides ‘hallucinating’
So, the word ‘hallucination’ isn’t fit for purpose. But we do legitimately need ways of understanding how LLMs behave, including their tendencies to produce false or unfaithful outputs.
Confabulation: a step forward, but still flawed
One alternative suggestion is to refer to the behaviour as ‘confabulation’ instead of ‘hallucination’.
While, as the authors linked above rightly point out, this does address some of the problems with the word ‘hallucination’, it still has the fundamental flaw we’ve been discussing. It still assumes that ‘hallucination’, or ‘confabulation’, is a causally distinguishable abnormal behaviour.
Humans sometimes confabulate, but normally they don’t. When humans do confabulate, we can expect there to be an explanation, like a head injury or mental illness. And we can expect there to be mechanisms in how the human mind works which explains why humans don’t normally confabulate.
But as we have seen, this is not an accurate analogy for LLMs. For an LLM, it is just as normal for it to produce outputs which are false or unfaithful outputs as it is for it to produce outputs which are true or faithful.
Bulls**t: a better solution
However, three philosophers at the University of Glasgow have come up with an ingenious solution.
Please don’t be alarmed. Despite appearances, ‘bulls**t’ is a technical term in analytic philosophy, developed in Harry Frankfurt’s 2005 classic, On Bulls**t. Following Frankfurt’s original usage, it’s generally taken to mean something like this:
bulls**tting: speaking with a reckless disregard for the truth.
Canonical examples include:
- Pretending you know something about whisky to try and impress a client, colleague, or crush
- Bizarre advertising slogans, like ‘Red Bull gives you wings’
- Any time Paul Merton opens his mouth on Just A Minute
Why is this a good match for describing how LLMs behave? Well, LLMs are trained to imitate language patterns from its training data. The LLM has succeeded at its task when its output is linguistically plausible. Not when it’s true, well-evidenced, or faithful to the input – just when it’s linguistically plausible. In other words, its goal is to ‘sound like’ what a human would say.
This is just like what a human does when they bulls**t. When someone bulls**ts, they’re not aiming to produce anything true, faithful or coherent, they’re just trying to sound like they know what they’re talking about. If I can convince my crush that I’m a sophisticated gentleman that knows his Islays from his Arrans, she just might consider going on a date with me. Actually saying something true or meaningful about whisky is not the task at hand.
The word ‘bulls**t’ is a huge step forward:
- It captures LLMs’ tendency to produce false or unfaithful output. Since truth and faithfulness are not goals, it’s perfectly possible for it to produce false or unfaithful output purely by chance.
- It explains why LLMs often produce true and faithful output, even if it is accidental. Plausible-sounding things, by their nature, have a good chance of being true.
- It correctly identifies false and unfaithful outputs as normal for an LLM - unlike ‘hallucination’.
Previously, we thought that an LLM reliably outputs true and faithful content, except when it occasionally hallucinates. Now, we recognise that an LLM in fact always outputs bulls**t.
The benefits of bulls**t
So far, the word ‘hallucination’ has trapped us into talking pretty negatively about LLMs. But the word ‘bulls**t’, perhaps surprisingly, frees us to talk honestly about the positive applications of LLMs as well.
As several authors have pointed out before13, the tendency of LLMs to produce false and misleading output corresponds closely to its tendency to produce output which might be described, for want of a better word, as ‘creative’.
Trying to understand why ‘hallucinating’ goes together with ‘creativity’ is a profound mystery. But it’s obvious that ‘bulls**tting’ is in some sense a ‘creative’ activity by nature. (Just ask Paul Merton!)
Therefore, because we have a more accurate description of how LLMs work, we can make more robust decisions both about where LLM behaviour is desirable, and about where it isn’t.
Would you employ a professional bulls**tter to write your company minutes? Or write large chunks of computer code? Or summarise a legal contract?
Or what about redrafting an email in a formal tone? Or suggesting a more descriptive variable name in your code? Or suggesting a couple extra Acceptance Criteria on your Scrum Issue that you might have forgotten? Are these applications where ‘sounds about right’ is good enough?
Recap
The word ‘hallucination’ has gripped the popular narrative about LLMs. But the word was never suited to describing how LLMs behave, and using it puts us at risk of making bad bets on how LLMs ought to be applied.
By using the word ‘bulls**t’, we recognise that while false and unfaithful outputs may be rare, they should occasionally be expected as part of the normal operation of the LLM.
As a result, we are empowered to recognise the applications where LLMs can really make a positive difference, as well as quickly recognise the applications where it can’t.
Footnotes
-
Like when an LLM made up the 1985 World Ski Championships. ↩
-
As one person found to their misfortune, when a New York lawyer filed a brief containing six references to non-existent cases. ↩
-
Meta’s AI academic assistant Galactica flopped after three days, following a backlash over its tendency to output falsehoods. ↩
-
See for example Gianluca Centulani on Medium. ↩
-
See for example IBM’s introduction to LLM hallucination ↩
-
As claimed by OpenAI’s GPT-4. ↩
-
See for example Shuster et al 2021. Retrieval Augmentation Reduces Hallucination in Conversation. ↩
-
See for example Ji et al 2024. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. ↩
-
For example, see a Harvard study on hallucination in the legal context, this philosophical article and this Forbes article expressing concern about the use of the word ‘hallucination’, this BBC journalist on using LLMs in journalism, and this PBS article which quotes an academic saying ‘this isn’t fixable’. ↩
-
If you want a fuller treatment of how the word ‘hallucination’ has been used historically in the field, see my personal blog post on the topic. ↩
-
See for example Huang, Lei et al 2023. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ↩
-
For more on this idea, look into Madeleine Clare Elish’s concept of moral crumple zones. ↩
-
See for example Jiang, Xuhui et al 2024. A Survey on Large Language Model Hallucination via a Creativity Perspective. ↩