In April the Alabama Supreme Court sanctioned an attorney who had filed legal briefs laden with inaccurate citations generated by AI, including numerous references to cases that did not exist. After being informed he had cited a made-up precedent in one filing, the lawyer promised it wouldn’t happen again—but then cited “nonexistent cases at the end of the very next sentence,” as a justice noted in a concurring opinion. At least one other lawyer was sanctioned that week for continuing to file AI-hallucinated material after being warned not to do so.
A database maintained by Damien Charlotin, a senior research fellow at the Paris School of Advanced Business Studies (HEC Paris), lists more than 1,400 cases where courts have addressed AI errors in the past three years, including filings by attorneys and self-represented litigants. As recently as last fall, Charlotin says, the list appeared to be growing exponentially. It’s since leveled off to a steady flow of exasperated judicial rulings. “For the past two or three months, we have reached a plateau of around 350, 400 decisions a quarter,” says Charlotin, who has also created an AI-powered reference checker called Pelaikan.
Courtroom proceedings are public, and lawyers face sanctions for false claims, making such errors comparatively easy to track. But uncaught errors in AI-generated material have also ensnared journalists, software developers, academic researchers and government consultants, some of whom have been well aware of AI’s fallibility. On May 19 the New York Times reported that the author of The Future of Truth, a book about how AI is shaping discourse, acknowledged his text contained more than a half-dozen fabricated or misattributed quotes produced by the technology.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
The pattern emerging across these cases is that people keep trusting AI’s answers even when they know the systems can be wrong. So far, that misplaced trust has led to dismissed legal appeals, attorney fines, fired journalists and software outages. Experts warn the stakes will rise as AI becomes more deeply embedded in professional work.
“Humans essentially have a tendency to believe that machines have more knowledge than they do, don’t break and are infallible,” says Alan Wagner, an associate professor of aerospace engineering at Pennsylvania State University.
AI also appears to inspire a particular kind of trust. It can generate answers that are realistic-sounding but false in a way humans seldom do—and people, it turns out, can find its guidance unusually believable. A study published this past February asked participants to complete an image classification task with guidance they were told came from either humans or AI. The guidance—no matter where it came from—was right only half the time, but among participants who were told the advice came from AI, those with positive attitudes toward the technology performed worse than those who held less favorable views. No such effect appeared when participants were told the advice came from humans.
“The results suggested that AI guidance has a quite specific ability to engender biases,” says study co-author Sophie Nightingale, a senior lecturer in psychology at Lancaster University in England.
Research co-authored by Wagner suggests the problem could extend well beyond office work into life-or-death scenarios. In experiments inspired by drone warfare, his team asked participants to categorize images as civilians or enemy combatants and to choose whether to fire a missile at each potential target. A robot then provided feedback on each classification—feedback that was, in fact, random—and though participants’ initial assessments were mostly accurate, they reversed their views in most cases where the bot disagreed. The scenario was a simulation, but participants were “shown imagery of innocent civilians (including children), a UAV [uncrewed aerial vehicle] firing a missile, and devastation wreaked by a drone strike,” according to the paper. They seemed to take the task seriously, says study co-author Colin Holbrook.
“I think that’s the context in which those findings have to be interpreted,” says Holbrook, an associate professor of cognitive and information sciences at the University of California, Merced. “These people were really trying. These people thought that it mattered,” he adds. And if the scenario had been real, “they would have killed a lot of innocent people.”
Compared with earlier automation tools, today’s AI handles a wider variety of tasks, such as generating computer programs and drafting legal briefs. That means more material to check, but it also means users can defer the thinking entirely to AI—what researchers at the University of Pennsylvania’s Wharton School recently called “cognitive surrender.” In one of the team’s experiments, participants received item-by-item feedback on a series of tasks and cash rewards for correct answers. Both practices reduced deference to faulty AI, but neither eliminated it, says Steven D. Shaw, a postdoctoral researcher at Wharton, who ran the study with associate professor of marketing Gideon Nave, also at Wharton.
Educating AI users about the technology’s limitations is another obvious approach, but efforts have produced limited results. As more than one judge has pointed out, attorneys should by now know not to file AI-generated legal material without checking it, yet hallucinations keep showing up in court filings.
Lab research has shown similarly modest effects from warning messages. In one recent study, researchers at Boston University “inoculated” students by alerting them that the AI chatbot ChatGPT tends to produce inaccurate summaries of academic sources and struggles with complex math and then asked them to complete related tasks using the tool. Participants warned about the source summaries were significantly more likely to verify the AI’s output on that task. The warning had no significant effect on the math problems, where verification rates remained low. Some participants told the researchers they came in trusting AI’s mathematical abilities; some said the experiment’s time constraints, which were built in to mimic real-world deadlines, cut into how often they verified results.
“Our findings suggest that awareness alone isn’t enough,” writes study co-author Chi B. Vu, a graduate student in human-AI interaction at BU’s Division of Emerging Media Studies, in an e-mail to Scientific American. “The message wasn’t ignored exactly; it was overridden by competing pressures and trust in certain tasks conducted by [generative] AI.”
Warnings about AI accuracy also compete with advertising that highlights the technology’s potential and with workplace pressures to use it to save time. And as AI improves at many tasks, users may grow less inclined to double-check it at all. That can keep them from seeing the errors that remain, further deepening their confidence.
“They don’t ever get to the ground truth,” Nightingale says. “They don’t have any reason to question it because they carry on in their lives thinking that AI tool is correct—because ‘Why wouldn’t it be?’”

