Google DeepMind is utilizing Gemini to educate agents within Goat Simulator 3

EXECUTIVE SUMMARY

Google DeepMind has developed a new video game-playing agent named SIMA 2, which can explore and resolve challenges in a diverse array of 3D virtual environments. The organization asserts that this represents significant progress towards the creation of more general-purpose agents and improved real-world robots.

Google DeepMind first showcased SIMA (which stands for “scalable instructable multiworld agent”) last year. However, SIMA 2 is based on Gemini, the company’s leading large language model, which significantly enhances the agent’s capabilities.

The researchers assert that SIMA 2 can perform a variety of more intricate tasks within virtual realms, autonomously devise solutions for specific obstacles, and engage in conversation with users. It can also improve its abilities by attempting more challenging tasks repeatedly and gaining insights through trial and error.

“Games have served as a major impetus for agent research for quite some time,” said Joe Marino, a research scientist at Google DeepMind, during a press event this week. He pointed out that even a seemingly simple action in a game, like lighting a lantern, can comprise multiple steps: “It’s a really intricate set of tasks you must accomplish to move forward.”

The ultimate goal is to create next-gen agents that can adhere to instructions and perform open-ended assignments in more complex settings than a web browser. In the long term, Google DeepMind aims to employ such agents to operate real-world robots. Marino claimed that the skills learned by SIMA 2, such as environment navigation, tool usage, and collaboration with humans to solve issues, are crucial foundational components for future robotic companions.

Unlike previous developments related to game-playing agents like AlphaZero, which triumphed over a Go grandmaster in 2016, or AlphaStar, which bested 99.8% of ranked human competitors in the video game StarCraft 2 in 2019, the aim of SIMA is to educate an agent to engage in an open-ended game without predetermined objectives. Rather, the agent learns to execute commands issued by human users.

Humans operate SIMA 2 through text conversations, verbal instructions, or by sketching on the game’s screen. The agent analyzes the video game’s visuals on a frame-by-frame basis and determines the necessary actions to accomplish its tasks.

Similar to its predecessor, SIMA 2 was trained on recordings of humans playing eight commercial video games, including No Man’s Sky and Goat Simulator 3, along with three virtual worlds created by the company. The agent learned to correlate keyboard and mouse inputs with corresponding actions.

Connected to Gemini, the researchers assert, SIMA 2 is significantly more adept at following directives (asking questions and giving updates as it proceeds) and discerning how to execute specific more complex tasks on its own.

Google DeepMind evaluated the agent in scenarios it had never encountered before. In one experimental setup, researchers instructed Genie 3, the latest iteration of the company’s world model, to create environments from scratch and introduced SIMA 2 into them. They found that the agent could navigate and follow instructions within these new contexts.

The researchers also employed Gemini to generate fresh tasks for SIMA 2. If the agent did not succeed initially, Gemini provided hints that SIMA 2 incorporated for its subsequent attempts. Repeatedly attempting tasks in this manner often enabled SIMA 2 to enhance its performance through trial and error until it ultimately succeeded, Marino noted.

Git gud

SIMA 2 remains an experimental endeavor. The agent encounters difficulties with complex tasks requiring multiple steps and extended timeframes for completion. It also retains only its most recent interactions (to enhance responsiveness, the team modified its long-term memory). Furthermore, it is still not nearly as proficient as humans in utilizing a mouse and keyboard to interact with a virtual world.

Julian Togelius, an AI researcher at New York University specializing in creativity and video games, finds the results intriguing. He comments that earlier attempts at training a single system to engage in various games have generally not been very successful. This is due to the challenges of training models to control multiple games solely by observing the screen: “Playing in real time from visual input only is ‘hard mode,’” he states.

Togelius specifically points to GATO, an earlier system from Google DeepMind, which—despite being greatly hyped—was unable to transfer skills across a substantial number of virtual environments.

Nonetheless, he is open-minded regarding the potential for SIMA 2 to contribute to enhanced robotics. “The real world is both more challenging and simpler than video games,” he observes. It’s more difficult because you can’t simply press A to open a door. Conversely, a robot in the real world will inherently understand what its body is capable of at any given moment. This clarity is absent in video games, where the governing rules within each virtual world can vary significantly.

Others express more skepticism. Matthew Guzdial, an AI researcher at the University of Alberta, is not particularly surprised that SIMA 2 can play a variety of video games. He highlights that most games share relatively similar keyboard and mouse controls: Master one, and you master them all. “If you present it with a game featuring unusual inputs, I doubt it would perform effectively,” he states.

Guzdial also casts doubt on how much of SIMA 2’s acquired knowledge would genuinely apply to real-world robots. “It’s far more challenging to interpret visuals from cameras in the real world compared to games, which are crafted with easily understandable visuals for human players,” he argues.

However, Marino and his team aspire to further their work with Genie 3, enabling the agent to progress within a type of endless virtual training dojo, where Genie creates worlds for SIMA to learn through trial and error, guided by Gemini’s insights. “We’ve merely begun to explore the possibilities,” he stated during the press conference.

Git gud

Our Company

About Links

Useful Links

Newsletter

Latest Posts

Google DeepMind is utilizing Gemini to educate agents within Goat Simulator 3

Git gud

Epstein email states that Andrew posed for a picture with Virginia Giuffre

Cybersecurity and LLMs

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Latest Posts