Home Tech/AIHow social media promotes the most negative aspects of AI enthusiasm

How social media promotes the most negative aspects of AI enthusiasm

by admin
0 comments
How social media promotes the most negative aspects of AI enthusiasm

Demis Hassabis, the head of Google DeepMind, expressed his thoughts in three words: “This is embarrassing.”  

Hassabis was reacting on X to an overly enthusiastic post by Sébastien Bubeck, a research scientist at competitor OpenAI, who proclaimed that two mathematicians had leveraged OpenAI’s most recent large language model, GPT-5, to unlock solutions for 10 previously unsolved mathematical dilemmas. “The acceleration of science through AI has officially commenced,” Bubeck proclaimed.

Put on your math caps for a moment, and let’s delve into what this dispute from mid-October was really about. It serves as an excellent illustration of the current issues surrounding AI.

Bubeck was thrilled that GPT-5 appeared to have miraculously resolved several challenges known as Erdős problems.

Paul Erdős, a highly prolific mathematician of the 20th century, left numerous puzzles behind upon his passing. To help track which ones have been resolved, Thomas Bloom, a mathematician at the University of Manchester, UK, establishederdosproblems.com, which catalogs over 1,100 problems and indicates that approximately 430 of them include solutions. 

When Bubeck celebrated GPT-5’s advances, Bloom was quick to call him out. “This is a significant misrepresentation,” he commented on X. Bloom clarified that a problem isn’t necessarily deemed unsolved just because this website doesn’t list a solution. It merely indicates that Bloom was not aware of one. Given the multitude of mathematics papers available, no one has read them all. However, GPT-5 likely has.

It was revealed that rather than developing original solutions to 10 unsolved problems, GPT-5 had searched the internet for 10 existing solutions that Bloom had not previously encountered. Oops!

There are two key insights here. One is that exaggerated declarations about major breakthroughs should not be made on social media: More thoughtful consideration is needed.

The second point is that GPT-5’s capacity to locate references to prior work that Bloom did not recognize is also remarkable. The excitement overshadowed what should have been an impressive feat in its own right.

Mathematicians are keen on utilizing LLMs to sift through enormous amounts of existing findings, François Charton, a research scientist who examines the application of LLMs to mathematics at AI startup Axiom Math, shared with me during our discussion about this Erdős oversight.

Nevertheless, literature reviews lack the thrill of true discovery, particularly for the ardent advocates of AI on social media. Bubeck’s mistake isn’t the only instance.

In August, a duo of mathematicians demonstrated that no LLMs at the time could tackle a math conundrum known as Yu Tsumura’s 554th Problem. Two months later, social media exploded with claims that GPT-5 could now solve it. “Lee Sedol moment is approaching for many,” one commenter stated, referencing the Go master who was defeated by DeepMind’s AI AlphaGo in 2016.

However, Charton noted that resolving Yu Tsumura’s 554th Problem isn’t regarded as a significant achievement by mathematicians. “It’s a problem you might assign to an undergraduate,” he mentioned. “There’s a tendency to exaggerate everything.”

Meanwhile, more balanced evaluations of the capabilities of LLMs are surfacing. Simultaneously to the online debates among mathematicians regarding GPT-5, two novel studies were published that examined in depth the use of LLMs in the fields of medicine and law (two areas where model creators claim their technology excels).

Researchers found that LLMs could accurately make certain medical diagnoses, but they were deficient in treatment recommendations. In the legal field, researchers discovered that LLMs frequently give inconsistent and erroneous advice. “The evidence so far does not convincingly meet the burden of proof,” the authors concluded.

However, that’s not the kind of message that resonates well on X. “There’s this buzz because everyone is communicating at an extraordinary pace—nobody wants to miss out,” Charton remarked. X is where much AI news is initially released, where new results are celebrated, and where influential individuals like Sam Altman, Yann LeCun, and Gary Marcus publicly engage in debates. It’s challenging to keep abreast of developments—and even harder to ignore.

Bubeck’s blunder was embarrassing primarily because it was exposed. Many errors remain unnoticed. Unless there’s a shift, researchers, investors, and general supporters will continue to set each other up. “Some of them are scientists, many are not, but they all share a passion for the field,” Charton explained to me. “Grand claims thrive on these platforms.”

*****

And there’s a postscript! I drafted everything you’ve just read for the Algorithm column in the January/February 2026 issue of MIT Technology Review magazine (coming very soon). Just two days after this went to print, Axiom announced that its own math model, AxiomProver, had solved two open Erdős problems (#124 and #481, for the math enthusiasts). That’s remarkable for a small startup established only a few months ago. Indeed—AI evolves rapidly!

But that’s not the end of it. Five days later, the company revealed that AxiomProver had conquered nine out of 12 challenges in this year’s Putnam competition, a collegiate-level math challenge that some consider more difficult than the better-known International Math Olympiad (which LLMs from both Google DeepMind and OpenAI mastered a few months earlier). 

The Putnam outcome received praise on X from notable figures in the field, such as Jeff Dean, chief scientist at Google DeepMind, and Thomas Wolf, cofounder at the AI firm Hugging Face. Once again, familiar discussions unfolded in the replies. A number of researchers observed that while the International Math Olympiad requires more creative problem-solving, the Putnam competition assesses mathematical knowledge—which makes it notoriously challenging for undergraduates but theoretically easier for LLMs that have absorbed vast amounts of internet content.

How should we appraise Axiom’s accomplishments? Not through social media, at least. And the attention-grabbing competition victories are just the beginning. Evaluating how proficient LLMs are at mathematics will necessitate a deeper examination of precisely how these models operate when tackling complex (read: challenging for humans) mathematical questions.

This article first appeared in The Algorithm, our weekly newsletter on AI. To receive stories like this in your inbox first, sign up here.

You may also like

Leave a Comment