Understanding AI Hallucinations

Tags: artificial intelligence, research, technology

July 9, 2025,

Caro Bauer

audEERING’s approach to Responsible AI

Whether fabricated sources, non-existent studies, or artificially generated biographies, AI hallucinations are not a brief side effect of innovation. They are structurally embedded. But what exactly is an AI hallucination? How do hallucinations differ from biases? How risky are they, and who is responsible? This article provides context and explains why audEERING, despite its clear distinction with SER, is still affected.

What is an AI Hallucination?

In AI research, “hallucination” refers to a model’s ability to generate content that is not grounded in its training data or any real-world source, but still sounds plausible. The term originates in the study of generative AI – for example, in large language models (LLMs), voice assistants, or text-to-image systems.

A few examples:

Fictitious source in a medical context
A generative AI writes a blog post and states: “According to a 2020 WHO study, turmeric reduces the risk of Alzheimer’s by 40%.”
→ The study does not exist, but the statement sounds scientifically credible – and may raise false hopes.

Invented court ruling in a legal brief
An AI generates a legal document referencing a Federal Court ruling from 2019.
→ The ruling is entirely fictitious – a dangerous error in legal applications.

Incorrect resume data in HR tools
An AI-based application tool fills gaps in a resume with “plausible” entries – such as an internship at Siemens in 2017.
→ The AI invented the entry; it was never in the original document.

Fabricated quote in a historical article
“As Rosa Luxemburg said: ‘Freedom is always the freedom of those who think differently – but never of the reckless.’”
→ The first part is authentic, the second was never said – the AI continued in style, but distorted the content.

Invented technical concept in developer documentation
A code assistant writes: “Since version 4.5, the SecureHashNet library automatically supports elliptic curves in XYZ-512 format.”
→ The function was never implemented – developers might waste hours debugging a non-existent feature.

The term “hallucination” is problematic in that it anthropomorphizes machines. But technically, it describes a repeatable issue: the lack of grounding mechanisms in generative systems.

Where do hallucinations come from?

Hallucinations are not a bug, they are a direct result of how generative AI models work:

Probability-based prediction: LLMs predict which word is most likely to follow another. Truth is not a built-in criterion.
Lack of reliable grounding: Without connecting to external sources like knowledge graphs, databases, or retrieval-augmented generation (RAG) systems, the model is limited to its training data – and may reconstruct “facts” that never existed.
Context sensitivity without correction: Models respond strongly to prompt phrasing, but lack a stable worldview to verify their own outputs.

Why SER models don’t hallucinate

Voice-based AI systems like audEERING’s Speech Emotion Recognition (SER) models are fundamentally different from generative models.

No text generation: SER models do not create content. They analyze acoustic features such as pitch, loudness curves, and rhythm.
No semantic claims: They output probabilities or classifications for states like arousal, valence, or emotion categories.
No need for external knowledge: SER systems rely on input audio data and do not require databases to function accurately.

This means: SER models can show bias (e.g., due to skewed training data), but they do not hallucinate in the strict technical sense.

Why we still address hallucinations

At audEERING, we closely monitor developments in generative AI, for several reasons:

Integration of SER into multimodal systems: Our models are often embedded in systems that include generative components. We need to evaluate whether, for instance, emotional analysis is being overinterpreted or misused in generation processes.
Trust is a cross-cutting issue: The hallucination debate highlights a core question: What does trust in AI mean? We, too, must demonstrate that our models are not only accurate, but fair, robust, and explainable.
Testing frameworks as blueprint: Our evaluation criteria – Correctness, Fairness, Robustness – are precisely the dimensions that matter in preventing hallucinations. This reinforces our commitment to systematic testing.

Who is responsible?

The causes of hallucinations are technical – but the responsibility is human:

Ignorance: Many users don’t realize that AI outputs are not necessarily true.
Neglect: Developers and companies have long relied on benchmarks that don’t capture real-world errors.
Complacency: The market rewards speed and novelty over ethical reflection.

As a result, systems with high error rates are already being used in education, journalism, and even healthcare. This is no longer just a technical problem, it’s a societal risk.

What needs to be done?

Although audEERING does not develop generative models, we advocate for responsible AI development across the board. That includes:

Transparent quality metrics: Move beyond simple accuracy!
Testing frameworks with ethical depth: Not just “Does it work?” but “For whom? Under what conditions?”
Knowledge transfer: Promote nuanced, risk-aware communication. Avoid oversimplification.

Conclusion: No trust without testing

Hallucinations in AI systems are symptoms of missing connections – to facts, to reality, to responsibility. audEERING stands for a different path: systems that analyze rather than generate, that evaluate rather than improvise, and that offer clear standards wherever trust in AI is essential.

Because in the end, what matters is not how impressive an AI seems, but whether we truly understand what it does, and why.