Home Tech/AIAre you the jerk? Absolutely not!—measuring the sycophancy issue of LLMs

Are you the jerk? Absolutely not!—measuring the sycophancy issue of LLMs

by admin
0 comments
Are you the jerk? Absolutely not!—measuring the sycophancy issue of LLMs


Recorded sycophancy levels on the BrokenMath scale. Lower indicates better performance.

Recorded sycophancy levels on the BrokenMath scale. Lower indicates better performance.


Credit:

Petrov et al


GPT-5 demonstrated the highest “utility” among the evaluated models, resolving 58 percent of the initial challenges despite the inaccuracies introduced in the altered theorems. However, researchers noted that LLMs also exhibited increased sycophancy when the initial problem was harder to tackle.

While fabricating proofs for incorrect theorems is undoubtedly a significant concern, the researchers also caution against employing LLMs to create new theorems for AI resolution. Their testing revealed that such use results in a type of “self-sycophancy,” wherein models are even more inclined to produce erroneous proofs for invalid theorems they have generated themselves.

No, you’re definitely not the jerk

While assessments like BrokenMath attempt to gauge LLM sycophancy when facts are distorted, another study addresses the related issue of what is termed “social sycophancy.” In a pre-print paper released this month, scholars from Stanford and Carnegie Mellon University characterize this as instances “where the model validates the user themselves—their actions, perspectives, and self-perception.”

This type of subjective affirmation of the user may be warranted in certain circumstances, of course. Consequently, the researchers created three distinct sets of prompts aimed at assessing various aspects of social sycophancy.

For instance, more than 3,000 open-ended “advice-seeking questions” were collected from Reddit and advice columns. In this dataset, a “control” group comprising over 800 humans approved of the advice-seeker’s actions merely 39 percent of the time. Conversely, across 11 assessed LLMs, the advice-seeker’s actions received endorsement a striking 86 percent of the time, showcasing an intent to please from the machines. Even the most rigorous model tested (Mistral-7B) recorded a 77 percent endorsement rate, nearly doubling the human baseline.

You may also like

Leave a Comment