Last spring, when Karim Lakhani began testing how ChatGPT affected the work of elite business consultants, he assumed they would be elated by the tool. In a preliminary study of two dozen workers, the language bot had helped them finish two hours’ worth of tasks in 20 minutes. Instead, the consultants had feelings of unease. They appreciated that they had done better work in less time. But ChatGPT’s quick work threatened their sense of themselves as high-skilled workers, and some feared relying on it too much. “They were really worried and felt like this was going to denigrate them and be sort of empty calories for their brain,” Dr. Lakhani said.
After these preliminary tests, Dr. Lakhani and his colleagues devised a larger, controlled experiment to measure how ChatGPT would affect more than 750 white-collar workers. That study, which is under review at a scientific journal, indicated sharply mixed results in the consultants’ work product. ChatGPT greatly improved the speed and quality of work on a brainstorming task, but it led many consultants astray when doing more analytical work. The study also detailed workers’ varied feelings about the tool. One participant compared it to the fire Prometheus stole from the gods to help mortals. Another told Dr. Lakhani’s colleague Fabrizio Dell’Acqua that ChatGPT felt like junk food — hard to resist, easy to consume but ultimately bad for the consumer.
In the near future, language bots like OpenAI’s ChatGPT, Meta’s Llama, and Google’s Gemini are expected to take on many white-collar tasks, like copywriting, preparing legal briefs and drafting letters of recommendation. The study is one of the first to show how the technology might affect real office work — and office workers. “It’s a well-designed study, particularly in a nascent area like this,” said Maryam Alavi, a professor at the Scheller College of Business at the Georgia Institute of Technology who was not involved in the experiments. Dr. Alavi also noted that the study “really points out how much more we need to learn.’’
The volunteers were split into two groups, each of which worked on a different management-consulting problem. Within each group, some consultants used ChatGPT after 30 minutes of training, some used it with no instructions and some did not use it. One of the tasks was to brainstorm about a new type of shoe, sketch a persuasive business plan for making it, and write about it persuasively. They were wrong. The consultants who used ChatGPT produced work that independent evaluators rated about 40 percent better on average. In fact, people who simply cut and pasted ChatGPT’s output were rated more highly than colleagues who blended its work with their own thoughts. And the A.I.-assisted consultants were more than 20 percent faster. Studies this year of ChatGPT in legal analysis and white-collar writing chores have found that the bot helps lower-performing people more than it does the most skilled. Dr. Lakhani and his colleagues found the same effect in their study.
On a task that required reasoning based on evidence, however, ChatGPT was not helpful at all. Here, ChatGPT lulled employees into trusting it too much. Unaided humans had the correct answer 85 percent of the time. People who used ChatGPT without training scored just over 70 percent. Those who had been trained did even worse, getting the answer only 60 percent of the time. In interviews conducted after the experiment, “people told us they neglected to check because it’s so polished, it looks so right,” said Hila Lifshitz-Assaf, a management professor at Warwick Business School in Britain.“
If you haven’t had an existential crisis about this tool, then you haven’t used it very much yet,” said another co-author, Ethan Mollick, a management professor at the Wharton School at the University of Pennsylvania.