‘Hallucinations’: Why do AI chatbots sometimes show false or misleading information?

‘Hallucinations’: Why do AI chatbots sometimes show false or misleading information?

Google’s new search feature, AI Overviews, is facing mounting backlash after users pointed out some factually inaccurate and misleading answers to queries.

AI Overview, which was launched two weeks ago, shows a summary of answers to common questions on Google Search at the top of the page that it gets from various sources around the Internet.

The goal of the new feature is to help users answer “more complex questions,” according to a Google blog post.

Instead, it has produced false answers like telling a user to glue cheese to pizza if it gets unstuck, to eat rocks to help with your health, or that former US President Barack Obama is Muslim, which is a conspiracy theory that has been debunked.

The AI Overview answers are the latest in a series of examples of where chatbot models respond incorrectly.

One study by Vectara, a generative AI startup, found that AI chatbots invented information anywhere from three to 27 per cent of the time.

What are AI hallucinations?

Large language models (LLMs), which power chatbots such as OpenAI’s ChatGPT and Google’s Gemini, learn to predict a response based on the patterns they observe.

The model calculates the most likely next word to answer your question based on what’s in their database, according to Hanan Ouazan, partner and generative AI lead at Artefact.

“That’s exactly how we work as human beings, we think before we talk,” he told Euronews.

But sometimes, the model’s training data can be incomplete or biased, leading to incorrect answers or “hallucinations” being made by the chatbot.

To Alexander Sukharevsky, a senior partner at QuantumBlack at McKinsey, it’s more accurate to call AI “hybrid technology” because the chatbot answers provided are “mathematically calculated” based on the data that they observe.

There’s no one reason why hallucinations happen, according to Google: it could be insufficient training data used by the model, incorrect assumptions, or hidden biases in the information the chatbot is using.

{{related align="center" size="fullwidth" ratio="auto" storyIdList="8461182" data=' Google's new AI summaries tool causes concern after producing misleading responses ' }}

Google identified several types of AI hallucinations, like incorrect predictions of events that might not actually happen, false positives by identifying non-existent threats, and false negatives that might not accurately detect a cancerous tumour.

But Google acknowledges there can be significant consequences to hallucinations, such as a healthcare AI model incorrectly identifying a benign skin model as malignant, leading to “unnecessary medical interventions”.

Not all hallucinations are bad, according to Igor Sevo, the head of AI at HTEC Group, a global product development firm. It just depends on what the AI is being used for.

“In creative situations, hallucinating is good,” Sevo said, noting that AI models can write new passages of text or emails in a certain voice or style. “The question now is how to get the models to understand creative vs truthful,” he said.

{{related align="center" size="fullwidth" ratio="auto" storyIdList="8437058" data=' Google to roll out AI-generated summaries at top of search engine ' }}

It’s all about the data

Ouazan said the accuracy of a chatbot comes down to the quality of the dataset that it’s being fed.

“If one [data] source is not 100 per cent… [the chatbot] might say something that is not right,” he said. “This is the main reason why we have hallucination."

For now, Ouazan said AI models are using a lot of web and open source data to train their models.

{{quotation_v2 align="center" size="fullwidth" ratio="auto" quote=""At the end of the day, it's a journey. Businesses don't have good customer service from day one either."" author="Alexander Sukharevsky, senior partner at QuantumBlack at McKinsey" }}

OpenAI, in particular, is also striking agreements with mass media organisations such as Axel Springer and News Corp and publications such as Le Monde to license their content so they can train their models on more reliable data.

To Ouazan, it’s not that AI needs more data to formulate accurate responses, it's that models need quality source data.

Sukharevsky said he’s not surprised that AI chatbots are making mistakes - they have to, in order for the humans running them to refine the technology and its datasets as they go.

“I think at the end of the day, it's a journey,” Sukharevsky said. “Businesses don’t have good customer service from day one either,” he said.

{{related align="center" size="fullwidth" ratio="auto" storyIdList="8433946" data=' OpenAI rival Anthropic launches chatbot Claude in Europe to give users more choice ' }}

A Google spokesperson told Euronews Next that its AI Overviews received many “uncommon queries” that were either doctored or that couldn’t accurately be reproduced, leading to false or hallucinated answers.

They maintain the company did “extensive testing” before launching AI Overviews and are taking “swift action” to improve their systems.

How can AI companies stop hallucinations?

There are a few techniques Google recommends to slow this problem down, like regularisation, which penalises the model for making extreme predictions.

The way to do this is to limit the number of possible outcomes that the AI model is able to predict, Google continued. Trainers can also give their model feedback, telling them what was liked and disliked about the answer so it will help the chatbot learn what users are looking for.

AI should also be trained with information that is "relevant" to what it will be doing, like using a dataset of medical images for an AI that will assist with diagnosing patients.

Companies with AI language models could record the most common queries and then bring a team together with individuals with different skills to figure out how to refine their answers, Sukharevksy said.

For example, Sukharevsky said that English language experts could be well suited to do the AI’s fine-tuning depending on what the most popular questions are.

{{quotation_v2 align="center" size="fullwidth" ratio="auto" quote="“I think it’s going to be solved, because if you don’t make [AI chatbots] more reliable, nobody’s going to use them."" author="Igor Sevo, head of AI at HTEC Group" }}

Large companies with major computing power could also take a chance at creating their own evolutionary algorithms to improve the reliability of their models, according to Sevo.

This is where AI models would hallucinate or make up training data for other models with truthful information that’s already been identified by mathematical equations, Sevo continued.

If thousands of models are competing against each other to find truthfulness, the produced models will be less prone to hallucinations, he said.

“I think it’s going to be solved, because if you don’t make [AI chatbots] more reliable, nobody’s going to use them,” Sevo said.

“It’s in everyone’s interest that these things will be used.”

Smaller companies can give a shot at manually fine-tuning what data their models consider reliable or truthful based on their own set of standards, Sevo said, but that solution is more labour-intensive and expensive.

Users should also be aware that hallucinations can happen, AI experts say.

“I would educate myself about what [AI chatbots] are, what they are not, so I have a basic understanding of its limitations as a user,” Sukharevksy said.

“If I see that things aren’t working, I would let the tool evolve.”