ChatGPT, Gemini and Grok confidently generate dangerous medical advice half the time, study finds

1 day ago 2
ARTICLE AD BOX

A study in BMJ Open reveals that around 50% of medical advice from popular AI chatbots is inaccurate. The research tested five platforms and found significant issues with open-ended queries.

A study has found that AI chatbots lie about health queriesA study has found that AI chatbots lie about health queries(Unsplash)

While there has been a lot of debate about the use of AI for healthcare, a new study published in the medical journal BMJ Open has found that around half the advice given by popular AI chatbots is false. The study, first reported by Bloomberg, evaluated five major AI platforms to highlight the growing health risks associated with generative AI platforms.

What did the study find?

The research published this week tested ChatGPT, Gemini, Meta AI, Grok, and DeepSeek and asked each of the chatbots 10 questions across five health categories. Out of the total responses generated, the researchers found that 50 percent contained problematic medical information. Furthermore, the study noted that nearly 20 percent of the generated answers were classified as highly problematic.

The researchers from the US, Canada, and the UK also found that the AI models performed relatively well when handling closed-ended questions concerning established medical topics, such as cancer and vaccines. However, the models struggled significantly to provide safe answers for open-ended queries or complex health subjects like nutrition and stem cells.

A major concern raised in the report is the authoritative tone these models adopt despite lacking clinical judgment or the licences to issue medical diagnoses. The research noted that the AI chatbots delivered answers to the health questions with confidence and certainty even when they could not provide a complete and accurate list of medical references to support their claims.

Out of the tested chatbots across the 10 questions, the researchers say there were only two refusals to answer a question, both of which came from Meta AI.

The authors of the study point out that a major risk for the deployment of these chatbots without proper oversight and public education could lead to them amplifying the spread of misinformation.

“These systems can generate authoritative-sounding but potentially flawed responses,” the researchers explained in the report. They added that the findings “highlight important behavioural limitations and the need to reevaluate how AI chatbots are deployed in public-facing health and medical communication".

The new study comes at a time when AI companies have been positioning their AI tools to have a bigger say in healthcare. OpenAI launched its ChatGPT Health earlier this year, which allows users to share their personal health data with the popular AI chatbot to receive more grounded results.

Meanwhile, Anthropic also launched Claude for Healthcare, which allows its paid users in the US to securely connect their medical records.

About the Author

Aman Gupta

Aman Gupta is a Digital Content Producer at LiveMint with over 3.5 years of experience covering the technology landscape. He specializes in artificial intelligence and consumer technology, reporting on everything from the ethical debates around AI models to shifts in the smartphone market. <br> His reporting is grounded in first-hand testing, independent analysis, and a focus on how technology impacts everyday users. He holds a PG Diploma in Radio and Television Journalism from the Indian Institute of Mass Communication, Delhi (Class of 2022). <br> Outside the newsroom, he spends his time reading biographies, hunting for the perfect coffee beans, or planning his next trip. <br><br> You can find Aman on <a href="https://www.linkedin.com/in/aman-gupta-894180214">LinkedIn</a> and on X at <a href="https://x.com/nobugsfound">@nobugsfound</a>, or reach him via email at <a href="aman.gupta@htdigital.in">aman.gupta@htdigital.in</a>.

Read Entire Article