OpenAI's latest LLM reveals the truths behind the functioning of AI. -

OpenAI, the creator of ChatGPT, has developed an innovative large language model that is significantly simpler to comprehend compared to conventional models.

This is particularly important, as current LLMs operate as black boxes: The exact mechanisms behind their functioning remain unclear. Creating a model with greater transparency illuminates the workings of LLMs generally, assisting researchers in understanding the reasons behind hallucinations, deviations in execution, and the degree of trustworthiness to place in them for essential tasks.

“As these AI technologies advance, they will increasingly be incorporated into critical areas,” stated Leo Gao, a research scientist at OpenAI, in an exclusive preview of the new findings shared with MIT Technology Review. “Ensuring their safety is crucial.”

This study is still in its early phases. The newly developed model, referred to as a weight-sparse transformer, is significantly smaller and substantially less powerful than leading mass-market models such as the company’s GPT-5, Anthropic’s Claude, and Google DeepMind’s Gemini. At its best, it is comparable to GPT-1, a model that OpenAI introduced back in 2018, according to Gao (though direct comparisons haven’t been made by him and his team).

However, the objective isn’t to rival the top tier (at least, not immediately). Instead, by analyzing this experimental model, OpenAI aspires to uncover the underlying processes governing those larger and superior versions of the technology.

This research is intriguing, notes Elisenda Grigsby, a mathematician at Boston College who investigates LLM functionalities and was not part of the project: “I am confident the methods it introduces will create a considerable impact.”

Lee Sharkey, a research scientist at the AI startup Goodfire, concurs. “This initiative targets the right objectives and appears to be executed well,” he comments.

Challenges in Understanding Models

OpenAI’s exploration fits within a burgeoning research domain known as mechanistic interpretability, which seeks to delineate the internal processes that models utilize while performing various tasks.

This task is more complex than it might seem. LLMs are constructed from neural networks, made up of nodes, known as neurons, arranged across layers. In typical structures, each neuron connects to all other neurons in adjacent layers. Such a configuration is called a dense network.

While dense networks are relatively efficient both in training and execution, they distribute learned information across a vast network of connections. Consequently, simple concepts or functions can be fragmented among neurons positioned in diverse sections of a model. Simultaneously, specific neurons may represent multiple distinct features, a phenomenon termed superposition (a concept borrowed from quantum mechanics). This results in an inability to correlate specific model components with distinct concepts.

“Neural networks are large, intricate, intertwined, and very challenging to comprehend,” remarks Dan Mossing, who leads the mechanistic interpretability team at OpenAI. “We’ve somewhat asked: ‘What if we attempted to change that?’”

Rather than constructing a model with a dense network, OpenAI opted for a type of neural network called a weight-sparse transformer, wherein each neuron connects to only a limited number of other neurons. This decision compelled the model to cluster features in localized groups rather than disperse them.

Their model operates significantly slower than any LLM available today. However, it facilitates a simpler connection between its neurons or groups of neurons to specific concepts and functionalities. “The interpretability of the model is drastically improved,” asserts Gao.

Gao and his colleagues have evaluated the new model with very basic tasks. For example, they prompted it to finish a block of text starting with quotation marks by inserting corresponding marks at the end.

This is a simple task for an LLM. The critical point is that understanding how a model executes such a straightforward task requires untangling a complex web of neurons and connections, according to Gao. However, with the new model, they were able to trace the precise steps taken by the model.

“We actually identified a circuit that precisely mirrors the algorithm you would think to implement manually, but it’s been entirely learned by the model,” he explains. “I find this very exciting and fascinating.”

What are the next steps for this research? Grigsby is skeptical that the method could scale to larger models that must address a range of more complex tasks.

Gao and Mossing recognize that this represents a considerable constraint of the model they have constructed thus far and concur that their approach is unlikely to yield models that can achieve performance on par with high-end products like GPT-5. Nonetheless, OpenAI believes it might refine the technique sufficiently to develop a transparent model comparable to GPT-3, the company’s breakthrough LLM from 2021.

“Perhaps in a few years, we could have a fully interpretable GPT-3, allowing a detailed understanding of how it performs each function,” shares Gao. “If such a system were available, we would gain immense knowledge.”

Challenges in Understanding Models

Our Company

About Links

Useful Links

Newsletter

Latest Posts

OpenAI’s latest LLM reveals the truths behind the functioning of AI.

Challenges in Understanding Models

Cybersecurity and LLMs

EmTech AI 2025: The transformation of science through AI

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Latest Posts