Vecchione and other independent experts say that this study and others like it are an important step toward understanding potential impacts of using AI chatbots. Vecchione’s assessment is that the Microsoft paper does seem to show that generative AI use is associated with less effortful cognitive processes. “One thing that I think that is interesting about knowledge workers in particular is the fact that there are these corporate demands to produce,” she added. “And so sometimes, you could understand how people would forego more critical engagement just because they might have a deadline.”
Microsoft declined Undark’s interview requests, via the public relations firm it works with, but Lev Tankelevitch, a senior researcher with Microsoft Research and a study co-author, did respond with a statement, which noted in part that the research “found that when people view a task as low-stakes, they may not review AI outputs as critically.” He added that, “All the research underway to understand AI’s impact on cognition is essential to helping us design tools that promote critical thinking.”
Other new research outside Microsoft presents related concerns and risks. For example, in March, an IBM study, which has not yet been peer reviewed, initially surveyed 216 knowledge workers at a large international technology company in 2023, followed by a second survey the next year with 107 similarly recruited participants. These surveys revealed an increased AI job-related usage — 35 percent, compared to 25 percent in the first survey — as well as emerging concerns among some of them about trust, both in the chatbots themselves and in co-workers who use them. “I found a lot of people talking about using these generative AI systems as assistants, or interns,” said Michelle Brachman, a researcher of human-centered AI at IBM and lead author of the study. She gleaned other insights as well, while interacting with the respondents. “A lot of people did say they were worried about their ability to maintain their skills, because there’s a risk you end up relying on these systems.”
People need to critically evaluate how they interact with AI systems and put “appropriate trust” in them, she added, but they don’t always do that.
And some research suggests that chatbot users may misjudge the usefulness of AI tools. Researchers at the nonprofit Model Evaluation & Threat Research recently published a preprint in which they conducted a small randomized controlled trial of software developers who completed work tasks with and without AI tools. Before getting started, the coders predicted that AI use would speed up their work by 24 percent, on average. But those productivity gains were not realized; instead, their completion time increased by 19 percent. The researchers declined Undark’s interview requests. In their paper, they attributed that slowdown to multiple factors, including low AI reliability, the complexity of the tasks, and overoptimism about AI usefulness, even among people who had spent many hours using the tools.
Of the findings, Alex Hanna, a sociologist, research director at Distributed AI Research Institute, and co-author of “The AI Con,” said: “It’s very funny and a little sad.”
In addition to looking into knowledge workers, much of the current AI-related research focuses on students. And if the connections between AI use and critical thinking prove true, some of these studies appear to confirm early concerns regarding the effects of the technology on education. In a 2024 Pew survey, for instance, 35 percent of U.S. high school teachers suggested AI in education can do more harm than good.
In April, researchers at Anthropic released an education report, analyzing one million anonymized university student conversations with its chatbot Claude. Based on the researchers’ study of those conversations, they find that the chatbot was primarily used for higher-order cognitive tasks, like “creating” and “analyzing.” The report also briefly notes concerns about critical thinking, cheating, and academic integrity. (Anthropic declined Undark’s interview requests.)
Then in June, MIT research scientist Nataliya Kosmyna and her colleagues released a paper, which hasn’t yet gone through peer review, studying the brain patterns of 54 college students and other young adults in the greater Boston area as they wrote an essay.
The MIT team noticed significant differences in the brain patterns of the participants’ brains — in areas that are not associated as a measure of intelligence, Kosmyna emphasized. Participants who only used LLMs to help with their task had lower memory recall; their essays had more homogeneity within each topic; and more than 15 percent also reported feeling like they had no or partial ownership over the essays they produced, while 83 percent had trouble quoting from the essays they had written just minutes ago.
“It does paint a rather dire picture,” said Kosmyna, lead author of the study and a visiting research faculty at Google.
The MIT findings appear to be consistent with a paper published in December, which involved 117 university students whose second language was English, who performed writing and revising tasks and responded to questions. The researchers found signs of what they described as “metacognitive laziness” about thinking among learners in the group using ChatGPT 4. That means some appeared to be becoming dependent on that AI assistance and offloading some of their higher-level thinking, such as goal-setting and self-evaluation, to the AI tools, said Yizhou Fan, the lead author.
The problem is that some learners, and some educators as well, don’t really distinguish between learning and performance, as it’s usually the latter that is judged for high or low marks, said Dragan Gašević, a computer scientist and professor at Monash University in Melbourne, Australia and a colleague of Fan’s. “Generative AI helps us enhance our performance,” in a way like doping, he said. “While learning itself requires much deeper engagement and experiencing hurdles.”
All this research literature comes with limitations. Many of the studies have fairly small sample sizes, focus on very specific tasks, and the participants might not be representative of the broader population, as they’re typically selected by age, education level, or within a narrow geographic area, in the case of Kosmyna’s research. Another limitation is the short time span of the studies. Expanding the scope could fill in some gaps, Vecchione said: “I’d be curious to see across different demographics over longer periods of time.”
Furthermore, critical thinking and cognitive processes are notoriously complex, and research methods like EEGs and self-reported surveys can’t necessarily capture all of the relevant nuances.
Some of these studies have other caveats as well. The potential cognitive impacts aren’t observed so much among people with more experience with generative AI and with more prior experience in the task for which they want assistance. The Microsoft study spotted such a trend, for example, but with weaker statistical significance than the negative effects on critical thinking.
Despite the limitations, the studies are still cause for concern, Vecchione said. “It’s so preliminary, but I’m not surprised by these findings,” she added. “They’re reflective of what we’ve been seeing empirically.”
Companies often hype their products while trying to sell them, and critics say the AI industry is no different. The Microsoft research, for instance has a particular spin: The authors suggest that it’s helpful that generative AI tools could “decrease knowledge workers’ cognitive load by automating a significant portion of their tasks,” because it could free them up to do other types of tasks at work.
Critics have noted that AI companies have excessively promoted their technology since its inception, and that continues: A new study published by design scholars documents how companies including Google, Apple, Meta, Microsoft, and Adobe “impose AI use in both personal and professional contexts.”
Some researchers, including Kosmyna at MIT, argue that AI companies have also aggressively pushed the use of LLMs in educational contexts. Indeed, at an event in March, Leah Belsky, OpenAI’s VP of education, said the company wants to “enable every student and teacher globally to access AI” and advocates for “AI-native universities.” The California State University system, the University of Maryland, and other schools have already begun incorporating generative AI into students’ school experiences, such as by making ChatGPT easily accessible, and Duke University recently introduced a DukeGPT platform. Google and xAI have begun promoting their AI services to students as well.
All this hype and promotion likely stems from the desire for larger-scale adoption of AI, while OpenAI and many other entities investing in AI remain unprofitable, said Hanna. AI investors and analysts have begun speculating that the industry could be in the midst of a bubble. At the same time, Hanna argues, the hype is useful for business managers who want to engage in massive layoffs, but not because the LLMs are actually replacing workers somehow.
Hanna believes the preliminary research about generative AI and critical thinking, such as the work from Microsoft, is worth taking seriously. “Many people, despite knowing how it all works, still get very taken with the technology and attribute to it a lot more than what it actually is, either imputing some notion of intelligence or understanding,” she said. Some people might benefit if they have in-depth AI literacy and really know what’s inside the black box, she suggests. However, she added, “that’s not most people, and that’s not what’s being advertised by companies. They benefit from having this veneer of magicalness.”