

Journal editors and peer reviewers are being flooded with AI-generated papers that are almost impossible to detect.
Last summer, Peter Degen’s postdoctoral supervisor came to him with an unusual problem: One of his papers was being cited too much. Citations are the currency of academia, but there was something unusual about these. Published in 2017, the paper had assessed the accuracy of a particular type of statistical analysis on epidemiological data and had received a respectable few dozen citations in other research papers over the years, but now it was being referenced every few days, hundreds of times, placing it among the most cited papers of his career. Another professor might be thrilled. Degen’s adviser asked him to investigate.
Degen, a postdoctoral researcher at the University of Zurich Center for Reproducible Science and Research Synthesis, found that the citing papers all followed a similar pattern. Like the original, they were analyzing the Global Burden of Disease study, a publicly available dataset compiled by the Institute for Health Metrics and Evaluation at the University of Washington. But they were using the dataset to churn out a seemingly endless supply of predictions: about the future likelihood of stroke among adults over 20 years old, of testicular cancer among young adults, of falls among elderly people in China, of colorectal cancer among people who eat minimal whole grains, of disease X among population Y, and so on.
Searching on GitHub for code that would be used to do this sort of analysis, Degen followed some links and wound up on the Chinese social media site Bilibili, where he discovered a Guangzhou-based company touting tutorials on how to produce publishable research in under two hours using its software tools and AI writing assistance. These studies were not very good. Researchers who analyzed a subset of studies about headaches found they were rife with errors and misrepresentations. But they were also not as flagrantly wrong as AI-generated papers of the recent past, making them more difficult to filter out.
“It’s a huge burden on the peer-review system, which is already at the limit,” Degen said. “There’s just too many papers being published and there’s not enough peer reviewers, and if the LLMs make it so much easier to mass produce papers, then this will reach a breaking point.”
Optimists about generative AI have high hopes for its ability to produce future scientific breakthroughs — accelerating discovery, eliminating most types of cancer — but the technology is currently undermining one of the pillars of scientific research, inundating editors and reviewers with an endless stream of papers. Paradoxically, the better the technology gets at producing competent papers, the worse the crisis becomes.
For the past decade, academic publishing has been contending with so-called “paper mills,” black-market companies that mass-produce papers and sell authorship slots to academics, doctors, or others who hope to gain a competitive edge by having published research on their resumes. It has been a game of cat and mouse, with publishers — often pressed by so-called science sleuths, researchers who specialize in ferreting out fraudulent research — closing one vulnerability only to have the mills find a new one. Generative AI was a boon to the mills, helping them to skirt plagiarism detectors by creating wholly new images and text. Still, the technology’s telltale hallucinations meant that publishers could at least theoretically screen out much of their work. In practice, papers still got through, only to get retracted when sleuths encountered a diagram of a rat with inexplicably gargantuan genitals labeled “testtomcels” or prose sprinkled with “as an AI assistant”s that someone forgot to delete.
But now AI has improved to the point where it can produce convincing papers almost wholesale, allowing desperate academics in need of a publication to mill papers of their own. The result is a deluge of scientific slop that threatens to swamp publishing, peer review, grant making, and the research system as it exists today.
Matt Spick, a lecturer in health and biomedical data analytics at the University of Surrey and an associate editor at Scientific Reports, first noticed the phenomenon when he received three strikingly similar papers analyzing the US National Health and Nutrition Examination Survey (NHANES), another public dataset. He checked Google Scholar and realized that it wasn’t a coincidence: There had been a sudden explosion in papers citing NHANES that all followed a similar formula, each purporting to discover an association between, for example, eating walnuts and cognitive function or drinking skim milk and depression.
“If you’ve got enough computing power, you go through and you measure every single pairwise association, and eventually you find some that haven’t been written on before and you just publish: There is a correlation between this and that,” Spick said. These correlations are often misleading simplifications of phenomena with multiple causes or random statistical flukes. “One was that how many years you spend in education will cause postoperative hernia complications. That is just a random correlation. What am I supposed to do with that? Leave school early so that I won’t get a postoperative hernia complication later?”
Over the years, sleuths have developed a variety of methods for detecting inauthentic papers. Some search for “tortured phrases,” instances where someone was trying to skirt plagiarism detectors by feeding an existing paper through a synonym generator, which often has the effect of turning technical terms like “reinforcement learning” into nonsense like “reinforcement getting to know,” to cite one recent example. Other sleuths track duplicated images, perform network analysis of authors, or check citations for hallucinated publications, a classic sign of LLM use. Spick searches for masses of papers following the same template as they analyze public datasets.
These papers may not necessarily be wrong, though they are often misleading. Nor are they strictly speaking fraudulent. They’re just useless, and suddenly very easy to make. Last year, several journals began restricting submissions of papers analyzing public datasets, citing a flood of redundant research.
Spick fears these measures may be fighting the last battle. In recent months, AI companies have released a range of “agentic” science assistants capable of analyzing data, generating hypotheses, and writing research papers with a high degree of autonomy. While a possible step toward the goal of AI-accelerated science, these systems also come with novel risks. When Carnegie Mellon researchers tested several agentic tools, they found that they sometimes invented data or used misleading techniques, but that these errors were only apparent upon close analysis of the full workflow; the final papers looked polished.
Announcing an AI paper writing assistant earlier this year, OpenAI’s then-vice president for science, Kevin Weil, predicted, “I think 2026 will be for AI and science what 2025 was for AI and software engineering.” Spick and some colleagues, curious what it could do, gave the tool, called Prism, some data from an already published paper documenting ripening times of eggplants and peppers. Prism analyzed the data, proposed a new statistical method that could be applied to it, and wrote an entire paper complete with charts and correct citations.
“We were all looking at each other like, ‘What the [expletive], this is actually a decent piece of work!’” Spick recalled. Unlike the generated papers he’d encountered previously, this one didn’t follow a template, nor was it using a single well-known database. It took 25 minutes and 50 seconds to produce.
“I’m genuinely not sure at what point we will suddenly realize that more are getting through than we realize because we can’t easily tell the difference anymore,” Spick said.
This raises some philosophical questions, Spick said, like: Does it matter who or what writes the paper if the information is accurate? And should science be in the business of publishing every possible fact?
“Part of science is supposed to be the filter. We’re supposed to publish the stuff that we think is interesting, not publish literally everything that we can possibly find,” Spick said. “Because if we do that, science is just spamming the world with all the data, irrespective of whether it constitutes actual new knowledge or not, and in any kind of medium-term time frame, it’s almost impossible to work out what’s meaningful and what isn’t.”
This is the immediate practical challenge posed by AI agents. They threaten to overwhelm the human systems that create and organize knowledge. Research funders are contending with onslaughts of proposals perfectly tailored to their particular grant, unable to parse which projects represent the next step in years of work and which were generated in minutes. Conference organizers, journal editors, and peer reviewers are all struggling to sort through a flood of material that all seems good enough at first glance to warrant a close read. There is an enormous and growing asymmetry between the time it takes to produce new work and the time it takes a subject-matter expert to vet it.
For Marit Moe-Pryce, the managing editor of the international relations journal Security Dialogue, submissions are up 100 percent over where they were a year before. Just as problematic: All the submissions have become pretty good. Gone are the blatant hallucinations and leftover prompts; everything has suddenly become coherent, well structured, and stylistically similar, difficult to say whether it is a wholly generated paper, an experienced academic, or a young scholar using AI as an editor.
“The main problem that we see currently from the desk is that the fraudulent side and the academic side are conflating, which ends up with a big gray mass of articles that we as editors need to sit and try to figure out, ‘What is this? Is this something that we need to engage with? Is it not?’” Moe-Pryce said.
One paper made it past at least 10 editors and two rounds of peer review before she noticed a fake citation — a very plausible one, involving several former editors of the journal on a topic they could have written about but never did. She then found several more. She doesn’t know at what stage of revision the hallucinations were introduced, but the close call underscored the level of care required to ensure nothing false gets published. Now that models increasingly cite real papers, she has to read for whether the works cited are the ones an expert would actually use, AI not yet having mastered the difference between canonical literature and more peripheral work.
“It’s incredibly detailed, and this is a normal part of the editorial work. The difference is that now you have to do that for all the rubbish that comes through the door,” Moe-Pryce said. “That’s why our workload becomes so unmanageable.”
Academic papers go through a multi-stage review process before publication. First, manuscripts are triaged for obvious problems, then sent to a journal’s editor, who decides whether it might be worth publishing. The editor then sends it to an associate editor with experience in the field, who again vets it before recruiting two or three subject-matter specialists — the “peers” in peer review — to read the paper and write responses. The editors and reviewers are typically working for free, volunteering their time in addition to their primary academic job.
The review system was already struggling under increasing volumes of submissions, and now AI is increasing those volumes while also making the bad ones more difficult to filter out. Moe-Pryce now spends more time sorting papers before deciding what to send out for review, and prospective reviewers, swamped themselves, are less and less likely to respond. Where she previously could send four queries out and get three replies, it now takes her a dozen tries to get two people. Increasingly, she reaches out to 20 reviewers and hears nothing.
“It’s fatigue. Academic journals have mushroomed, and then you have AI helping everyone fraudulent or not generate more, faster, so you have a massive increase in volume,” she said. “AI currently holds the potential to bring down the publishing system as we know it.”
The journal Accountability in Research has seen a 60 percent surge in submissions this year, according to David Resnik, an associate editor at the journal. Ironically, he has been besieged by likely AI-generated papers about fraudulent academic papers that have mined public data compiled by the organization Retraction Watch.
He, too, is struggling to find reviewers. At times, he’s had to send out 20 requests just to get two responses — and he’s suspected that some of the responses he’s received are AI-generated themselves. He has reason to be suspicious. A survey conducted by the publishing company Frontiers last year found that more than half of researchers have used AI assistance in their peer review.
“I’m very worried about this straining, breaking the back of the peer-review system,” said Resnik.
AI agents arrive at a time when the quality filters of academia are already struggling to cope with a superabundance of papers. The number of scientific papers published has grown exponentially in recent years, according to an analysis of data published in Quantitative Science Studies, while the number of PhDs who might review them has not. Unfortunately, the authors attribute this explosion in productivity not to rapid progress in science but to the fact that commercial and professional incentives align to publish the maximum quantity of papers.
Many journals have shifted to an “open access” model where they earn revenue by charging authors processing fees to have their papers published, as opposed to charging for subscriptions. In earnings calls, publishing companies tout the recent 20 percent or more increase in submissions as a positive growth story. Universities and funding agencies, meanwhile, look at researchers’ publication metrics when deciding whom to fund or promote, which means researchers are under pressure to “publish or perish.” Nor is it only traditional academics who are under this pressure to publish. Overseas medical students can improve their chance at a US residency program by having a few peer-reviewed papers on their resume. In China, medical doctors have strong incentives to publish despite neither having the time nor resources to conduct research, making quick paper generation an attractive option.
If you introduce an infinite paper-writing machine to a system that defines productivity by the number of papers written, people will use it to write a lot of papers. A study published in Nature this year found that scientists who adopted AI published three times more papers and received nearly five times more citations than those who didn’t. They also became research project leaders 1.37 years earlier than those who did not use AI. While individually beneficial, the embrace of AI to mass-produce papers may be detrimental to science as a collective endeavor, beyond exhausting journal editors and peer reviewers. The same study found a collective narrowing of focus as these newly productive scientists gravitated toward well-studied fields with abundant existing data for AI to synthesize.
There are no easy solutions to this problem. In 2022, the scientific organization STM launched an initiative called Integrity Hub to contend with paper mills. Since then, it has been engaged in an “arms race” with AI, according to Joris van Rossum, the project’s program director — assembling automated tools that check for plagiarism, then tortured phrases, then fake citations — but the group must now consider more sweeping remedies.
“We anticipate a future where it’s going to be more realistic to enable submitters to demonstrate authenticity rather than trying to detect fabrication,” he said. That is, once fraudulent manuscripts are impossible to detect, publishers will have to find a way for researchers to prove their work is real — perhaps by working with instrument manufacturers to develop ways of watermarking their images, he said, or having researchers submit more of the data behind their work so it can be analyzed for suspicious signals.
This would entail changing the way research is done on a massive scale, and while it might stem outright fraud, it would do little to reduce the volume problem. Using AI to assist with peer review, as some have proposed — and some reviewers are already doing, permitted or not — raises a nest of other possible risks. Studies have found that models often continue to cite retracted studies as valid and write superficially good reviews while overlooking methodological problems. AI reviewers also appear to prefer AI-generated writing.
“It’s not really a tractable problem,” said Reese Richardson, a postdoctoral fellow at Northwestern University who studies mass-produced papers. “I think that the only way out of this situation is to actually change the way that the scientific enterprise awards prestige and awards resources. As long as we have this hyper-competitive, hyper-unequal rat race where people’s productivity and their worth as scientists is being measured by how many publications they put out and how many times they get cited, it’s just going to incentivize this behavior.”
Vincent Larivière, the editor-in-chief of Quantitative Science Studies, had a similar diagnosis. His journal has seen a 40 percent increase in submissions this year.
“We need a reform of what matters in science,” Larivière said. The conflation of scientific productivity with publication counts has had a distorting effect on science, causing research to gravitate toward small, tractable problems that are guaranteed to result in something publishable. AI could do great things, he said — help cure cancer, develop fusion energy — but right now it is being used to generate papers to “pad CVs.”
“Of course we need more science,” he said, “but do we need more papers?”






