Proof of AI Garbage In, Garbage Out: Incorrect Results Traced to Reddit and Quora

We have repeatedly pressed readers not to use AI because its output is unreliable. For instance, a commenter managed to post an AI-generated definition of fiduciary duty. It missed the critical aspect that fiduciary duty is the highest standard of care under the law and requires the agent to put the principal’s interest before his own. If AI can’t get something so fundamental, so widely discussed, and not that hard to get right correct, how can it be trusted only any topic?

And that’s before factoring in that AI makes regular users stoopider. Or that Sam Altman has warned: What you share with ChatGPT could be used against you.

If you are still so hapless as to use Google for search and have it sticking its AI search results in your face, those are unreliable too. AI can’t even compile news sources correctly. From ars technica:

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The researchers tested eight AI-driven search tools by providing direct excerpts from real news articles and asking the models to identify each article’s original headline, publisher, publication date, and URL. They discovered that the AI models incorrectly cited sources in more than 60 percent of these queries, raising significant concerns about their reliability in correctly attributing news content.

We got another example by e-mail from a personal contact in Southeast Asia. He has taught IT in universities hare and the UK. He s also an inventor and had a UK business with over 40 employees based on one of his creations. He is now working on two other devices and has a patent issued on one of them. He showed me an early model of one and the super-powerful custom magnets he’d had fabricated to make it work better. His message:

I’ve been using different AIs (ChatGPT, DeepSeek and Luna) for doing some calculations and finding info on stuff like metal properties and then I started noticing errors. Being autistic I pointed this out – Luna said “oops – don’t worry it’ll be right this time”, ChatGPT said it’s right I’m wrong and DeepSeek sulked and refused to interact anymore.

Anyway, I then used some tools I got when I was at the uni to find plagiarism to find the sources of the data and the majority came from Reddit and Quora – which are hardly sources of accurate information. There appear to be no mechanisms to see if the data is correct, they just scrape websites and take it as gospel.

Bottom line is that a lot of what they present is junk. God help us if say medical professionals rely on it. And I can’t see any way out of it except by getting professionals to check the data and that is very expensive.

Regulars readers may recall that we had previously posted on the fact that AI is now being heavily used in medicine and IM Doc describing the planned outsourcing of diagnosis to AI. From a February 2024 post:

There will be cameras and microphones in the exam room. Recording both the audio and video of everything that is done. The AI computer systems will then bring up the note for the visit from thin air – after having watched and listened to everything in the room. Please note – I believe every one of these systems is done through vast web services like AWS. That means your visit and private discussions with your doctor will be blasted all over the internet. I do not like that idea at all. This is already being touted to “maximize efficiency” and “improve billing”. My understanding from those that have been experimented upon as physicians, that as you are completing the visit, the computer will then begin demanding that you order this or that test because its AI is also a diagnostician and feels that those tests are critical. It will also not let you close the note until you have queried the patient about surveillance stuff – ie vaccines and colonoscopy, even for visits for stubbed toenails. And unlike now when you can just turn that stuff off, it is in control and watching and listening to your every move. The note will not be completed until it has sensed you discussing these issues with your patient and satisfied that you pushed hard enough.

I understand also that there is a huge push to begin the arduous task of having AI take over completely things like reading x-rays and path slides. Never mind the medicolegal issues with this – ie does the AI have malpractice insurance? Does it have a medical license? Who does the PCP talk to when there is sensitive material to discuss with a radiologist, as in new lesions on a mammogram etc? Are we to discuss this with Mr. Roboto?…

The glee with which the leaders of this profession are jumping into this and soon to be forcing this upon us all gives one a very sick feeling. Complete disregard for the ethics of this profession dating back centuries.

IM Doc later provided a horrorshow example of the hash it makes of transcribing patient notes. In one case, it invented multiple serious illnesses the patient had never had and even a pharmacy that did not exist. Extracted from his message:

This is happening all the time with this technology. This example is rather stark but on almost 2/3 of the charts that are being processed, there are major errors, making stuff up, incorrect statements, etc. Unfortunately – as you can see it is wickedly able to render all this in correct “doctorese” – the code and syntax we all use and can instantly tell it was written by a truly trained MD.

This patient actually came into the office for an annual visit. There was nothing ground-shaking discussed….

This patient is on no meds that are not supplements. There are no prescriptions – and yet we supposedly discussed 90 day supplies from Brewer’s Pharmacy in Bainesville. There is no pharmacy nor town anywhere around here that even remotely sounds like either one. A quick google search revealed a Bainesville MD, far away from where we are – but as far as I can tell there is no Brewer’s Pharmacy there – the only one in the country I could find was in deep rural Alabama.

The last paragraph was literally the only part of this entire write up which was accurate…

This is what I do know however

1) Had I signed this and it went in his chart, if he ever applied to anything like life insurance – it would have been denied instantly. And they do not do seconds and excuses. When you are done, you are done. If you are on XXX and have YYY – you are getting no life insurance. THE END.

2) This is yet another “time saver” that is actually taking way more time for those of us who are conscientious. I spend all kinds of time digging through these looking for mistakes so as not to goon my patient and their future. However, I can guarantee you that as hard as I try – mistakes have gotten through. Furthermore, AI will very soon be used for insurance medical chart evaluation for actuarial purpose. Just think what will be generated.

3) These systems record the exact amount of time with the patients. I am hearing from various colleagues all over the place that this timing is being used to pressure docs to get them in and get them out even faster. That has not happened to me yet – but I am sure the bell will toll very soon.

4) When I started 35 years ago – my notes were done with me stepping out of the room and recording the visit in a hand held device run by duracells. It was then transcribed by secretary on paper with a Selectric. The actual hard copy tapes were completely magnetically scrubbed at the end of every day by the transcriptionist. Compare that energy usage to what this technology is obviously requiring. Furthermore, I have occasion to revisit old notes from that era all the time – I know instantly what happened on that patient visit in 1987. There is a paragraph or two and that is that. Compare to today – the note generated from the above will be 5-6 back and front pages literally full of gobbledy gook with important data scattered all over the place. Most of the time, I completely give up trying to use these newer documents for anything useful. And again just think about the actual energy used for this.

5) This recording is going somewhere and this has never been really explained to me. Who has access? How long do they have it? Is it being erased? Have the patients truly signed off on this?

6) This is the most concerning. I have no idea at all where the system got this entire story in her chart. Because of the fake “Frank Capra movie” style names in the document I have a very unsettled feeling this is from a movie, TV show, or novel. Is it possible that this AI is pulling things “it has heard” from these kinds of sources? I have asked. This is not the first time. The IT people cannot tell me this is not happening.

I have no idea why there is such a push to do this – this is insane. Why the leaders of my profession and our Congress are all behind this is a complete mystery.

After sending the sightings from the inventor, IM Doc replied:

This week, the students and I had a patient in the office with COVID. A woman with multiple co-morbid conditions, very ill at baseline. She is on both a statin and an SSRI, and amiodarone for her heart issues. There are 3 other drugs – HCTZ, ASA, and occasionally some Advil for pain.

The student was getting ready to give her Paxlovid for her COVID. When confronted with the fact that she is on 3 drugs which are absolutely contraindicated with Paxlovid, and one other that is conditionally contraindicated she informed me that ChatGPT had told her that all were just fine. This young woman is a student at one of our very elite medical schools – and she looked at me and said “Your human brain tells you this is a problem, the AI has millions of documents to look through and has told me this is not a problem…….I will trust the AI”

I said, “Not with my patient, you don’t”.

I have to be honest – I was so concerned about this I did not even know where to start with the student. AI has now officially become a part of the youth brain’s neocortex. I am just about to give up on this entire generation of medical students. It is a lost cause at best.

KLG had a more mundane example:

Trivial but real failure of AI/LLM on a simple question I used as a test after reading about the medical student who loves her some AI.

Query: Oklahoma cheats to win against Auburn
Answer: There are no verified reports or evidence of cheating in the game between Oklahoma and Auburn.

In fact I have a list that includes about 20 links that prove Oklahoma cheated by using the dishonest move of having a wide receiver pretend to leave the field for a substitution and then scoring on a pass play because he was not covered by the Auburn defense. This has been illegal, as in cheating in the form a “snake in the grass play,” at every level since I played football from third grade through high school. Presearch.com AI is clueless, though.

So we again exhort readers: do not use AI. Please discourage others from using AI. Large language models need so much content as training sets that they not only can’t afford to discriminate in terms of content, but they are even eating their bad output as part of their training sets. If you need remotely accurate answers, you need to opt out.