Researchers fool university markers with AI-generated exam papers

<span>Study authors say their experiment proves definitively that the ‘Turing test’ has been passed.</span><span>Photograph: Michael Dwyer/AP</span>
Study authors say their experiment proves definitively that the ‘Turing test’ has been passed.Photograph: Michael Dwyer/AP

Researchers at the University of Reading fooled their own professors by secretly submitting AI-generated exam answers that went undetected and got better grades than real students.

The project created fake student identities to submit unedited answers generated by ChatGPT-4 in take-home online assessments for undergraduate courses.

The university’s markers – who were not told about the project – flagged only one of the 33 entries, with the remaining AI answers receiving higher than average grades than the students.

The authors said their findings showed that AI processors such as ChatGPT were now passing the “Turing test” – named after the computing pioneer Alan Turing – of being able to pass undetected by experienced judges.

Related: More than half of UK undergraduates say they use AI to help with essays

Billed as “the largest and most robust blind study of its kind” to investigate if human educators could detect AI-generated responses, the authors warned that it had major implications for how universities assess students.

“Our research shows it is of international importance to understand how AI will affect the integrity of educational assessments,” said Dr Peter Scarfe, one of the authors and an associate professor at Reading’s school of psychology and clinical language sciences.

“We won’t necessarily go back fully to hand-written exams, but [the] global education sector will need to evolve in the face of AI.”

The study concluded: “Based upon current trends, the ability of AI to exhibit more abstract reasoning is going to increase and its detectability decrease, meaning the problem for academic integrity will get worse.”

Experts who reviewed the study said it was a death knell for take-home exams or unsupervised coursework.

Prof Karen Yeung, a fellow in law, ethics and informatics at the University of Birmingham, said: “The publication of this real-world quality assurance test demonstrates very clearly that the generative AI tools freely and openly available enable students to cheat take-home examinations without difficulty to obtain better grades, yet such cheating is virtually undetectable.”

The study suggests universities could incorporate AI material generated by students in assessments. Prof Etienne Roesch, another author, said: “As a sector, we need to agree how we expect students to use and acknowledge the role of AI in their work. The same is true of the wider use of AI in other areas of life to prevent a crisis of trust across society.”

Prof Elizabeth McCrum, Reading’s pro-vice-chancellor for education, said the university was “moving away” from using take-home online exams and was developing alternatives that would include applying knowledge in “real-life, often workplace related” settings.

McCrum said: “Some assessments will support students in using AI. Teaching them to use it critically and ethically; developing their AI literacy and equipping them with necessary skills for the modern workplace. Other assessments will be completed without the use of AI.”

But Yeung said allowing the use of AI in exams at schools and universities could create its own problems in “deskilling” students.

“Just as many of us can no longer navigate our way around unfamiliar places without the aid of Google Maps, there is a real danger that the coming generation will end up effectively tethered to these machines, unable to engage in serious thinking, analysis or writing without their assistance,” Yeung said.

In the study’s endnotes, the authors suggest they may have used AI to prepare and write the research, stating: “Would you consider it ‘cheating’? If you did consider it ‘cheating’ but we denied using GPT-4 (or any other AI), how would you attempt to prove we were lying?”

A spokesperson for Reading confirmed that the study was “definitely done by humans”.