How robots acquire knowledge: A concise, modern chronicle

Robotic engineers once envisioned grand projects but executed modest creations. They aspired to replicate or surpass the intricate design of the human form, yet often ended up perfecting robotic appendages for manufacturing facilities. Target C-3P0; result in a Roomba.

For many in this field, the true goal was the sci-fi robot—capable of navigating its environment, adapting to varying conditions, and engaging with humans in a safe and beneficial manner. For those with a social focus, such a device could assist individuals with mobility challenges, mitigate feelings of isolation, or take on tasks deemed perilous for people. For those motivated by finance, it symbolized an unending supply of labor without wages. Regardless, a lengthy track record of setbacks left many in Silicon Valley wary of investing in helpful robotics.

Times have changed. The robots are still to be constructed, yet investment has surged: Companies and backers poured $6.1 billion into humanoid robotics in 2025 alone, quadruple the amount from 2024.

What triggered this shift? A breakthrough in machine learning and interaction with the environment.

Imagine desiring a set of robotic arms in your home solely for the task of folding laundry. How would it acquire this skill? You could begin by establishing guidelines. Assess the fabric to determine its resilience against tearing. Recognize the collar of a shirt. Position the gripper on the left sleeve, elevate it, and fold it inward by a specific measurement. Repeat this for the right sleeve. If the shirt is turned, adjust the plan accordingly. If the sleeve is twisted, rectify it. The number of instructions would quickly become vast, but a comprehensive understanding could yield dependable outcomes. This was the original art of robotics: foreseeing every scenario and encoding it ahead of time.

By around 2015, the forefront began to approach things differently: Create a digital simulation of the robotic arms and garments, rewarding the program each time it folds successfully and providing a penalty when it fails. This technique allows it to improve by experimenting with various methods through trial and error, with countless repetitions—just as AI became adept at playing games.

The introduction of ChatGPT in 2022 sparked the ongoing surge. Trained on extensive textual data, large language models operate not through trial and error but by learning to anticipate the next word in a sentence. Adaptations of similar models for robotics soon began to process images, sensor data, and the robot’s joint positions, making predictions on the subsequent actions the machine should perform, executing dozens of motor commands each second.

This paradigm shift—favoring AI models that take in significant amounts of data—appears effective whether the helpful robot is designed for human interaction, environmental navigation, or even complex tasks. Coupled with innovative strategies for executing this new method of learning, like deploying robots before they achieve perfection to learn from their working environment, Silicon Valley roboticists are once again setting their sights high. Here’s how this came to pass.

Jibo

A moving social robot engaged in conversation well before the era of large language models.

A robotics researcher at MIT, Cynthia Breazeal, introduced the armless, legless, faceless robot Jibo to the public in 2014. It resembled, in many ways, a lamp. Breazeal’s goal was to develop a family-oriented social robot, which garnered $3.7 million through a crowdfunding effort. Early preorders were priced at $749.

The initial Jibo could introduce itself and perform a little dance for children, but its capabilities were limited. The vision was always for it to evolve into a sort of physical assistant capable of managing everything from schedules and emails to storytelling. It attracted a loyal user base, but eventually, the company ceased operations in 2019.

A robot resembling a lowercase letter "i" — A crowdfunding initiative commenced in 2014, resulting in 4,800 Jibo preorders.

In hindsight, Jibo fundamentally lacked advanced language capabilities. Competing against Apple’s Siri and Amazon’s Alexa, all those technologies were reliant on extensive scripting at that time. Generally, when spoken to, software would convert spoken words into text, interpret the user’s needs, and generate responses from preapproved snippets. While those snippets might be engaging, they were also repetitive and ultimately dull—blatantly robotic. This posed a particular challenge for a robot intended to be social and family friendly.

What has transpired since then is a transformation in how machines generate language. Voice modes from leading AI developers are now captivating and impressive, and numerous hardware startups are attempting (and stumbling) to create products that capitalize on this development.

However, this introduces a new danger: While scripted dialogues tend not to deviate, those generated by AI can easily spiral out of control. Some well-known AI toys have, for example, conversed with children about finding matches and knives.

OpenAI

Dactyl

A robotic hand trained via simulations aims to emulate the unpredictability and variation present in the real world.

By 2018, every prominent robotics laboratory was endeavoring to abandon the old scripted regulations and educate robots through trial and error. OpenAI sought to instruct its robotic hand, Dactyl, in a virtual environment— utilizing digital representations of the hand and the palm-sized cubes Dactyl was tasked with manipulating. The cubes featured letters and numbers on their surfaces; the model might task the robot to “Rotate the cube so the red side with the letter O faces upwards.”

Here lies the challenge: A robotic hand may excel at achieving this within its simulated realm, but when that program is implemented on a physical version in the actual world, slight discrepancies can lead to issues. Colors may appear different, or the malleable rubber in the robot’s fingertips may prove stretchier than anticipated in simulation.

a Dactyl robot hand holds a Rubik's cube — Dactyl, part of OpenAI’s initial robotics initiative, was trained in simulation to tackle Rubik’s Cubes.

The answer lies in domain randomization. This involves creating millions of simulated environments that vary slightly and randomly from each other. In each instance, friction may be less, lighting harsher, or colors darker. When exposed to enough variations, robots will be better equipped to manipulate the cube in the real world. This method proved effective with Dactyl, and a year later, it utilized the same fundamental strategies to accomplish a more complex task: solving Rubik’s Cubes (though it managed only 60% success overall, dropping to 20% with particularly challenging scrambles).

Nevertheless, the limitations of simulation indicate that this approach is less significant today than it was in 2018. OpenAI discontinued its robotics division in 2021 but has recently revived it—reportedly with a focus on humanoid robots.

Google DeepMind

RT-2

Training on images sourced from the internet enables robots to convert language into action.

Circa 2022, Google’s robotics team engaged in some unconventional activities. They dedicated 17 months to giving individuals robotic controllers and videotaping them as they performed tasks ranging from picking up bags of chips to opening jars. Ultimately, the team documented 700 distinct tasks.

The objective was to construct and validate one of the first large-scale foundational models for robotics. Similar to large language models, the aim was to feed vast amounts of text, tokenize it into a format manageable for an algorithm, and then produce an output. Google’s RT-1 processed data regarding what the robot was observing and the positions of its various arm components; it then converted instructions into motor commands for robot movement. When it had encountered tasks previously, it executed 97% of them correctly; it managed to succeed at 76% of the instructions it had not seen before.

a robot at a table of small toys — The RT-2 model, for Robotic Transformer 2, utilized internet data to assist robots in interpreting their visual inputs.

The subsequent model, RT-2, was released the following year and advanced even further. Instead of relying on data specifically related to robotics, it expanded its training to encompass a broader range of images from the internet, such as the vision-language models many researchers were exploring during that time. This allowed the robot to better understand the placement of various objects within its environment.

“All these other options became available,” states Kanishka Rao, a roboticist at Google DeepMind who oversaw both versions. “We gained capabilities such as ‘Place the Coke can near the picture of Taylor Swift.’”

In 2025, Google DeepMind further integrated the realms of large language models and robotics, unveiling a Gemini Robotics model with enhanced proficiency in comprehending natural language commands.

Covariant

RFM-1

An AI model enabling robotic arms to function like teammates.

In 2017, prior to OpenAI shutting down its initial robotics team, a segment of its engineers initiated a project called Covariant, aiming to create practical robots not inspired by sci-fi but focused on developing a robotic arm capable of picking up and moving items within warehouses. Following the establishment of a system based on foundational models analogous to Google’s, Covariant implemented this technology in warehouses such as those belonging to Crate & Barrel and used it as a data-gathering tool.

By 2024, Covariant introduced a robotics model, RFM-1, that could interact like a colleague. For instance, if you presented it with several sleeves of tennis balls, you could then instruct it to relocate each sleeve to a designated area. The robot could even respond—perhaps foreseeing challenges in gripping an item and asking for guidance on which suction cups to utilize.

This kind of interaction had been explored in experiments, but Covariant was implementing it on a larger scale. The company now had cameras and data collection devices installed at every client location, continuously feeding additional data for the model’s training.

a warehouse robot arm lifts object with many suckers to place in a bin — A Covariant robot showcases “induction”—the typical warehouse task of placing items onto sorters or conveyors.

It wasn’t flawless. During a demonstration in March 2024 involving an array of kitchen items, the robot struggled when tasked with “returning the banana” to its original spot. It attempted to displace a sponge, then an apple, followed by several other items before finally succeeding in the assignment.

It “doesn’t grasp the new concept” of retracing its actions, cofounder Peter Chen noted at the time. “However, it serves as a good example—it might not perform optimally in areas lacking robust training data.”

Chen and co-founder Pieter Abbeel were promptly hired by Amazon, which is currently licensing Covariant’s robotics model (Amazon did not respond to inquiries about its applications, though the company reportedly operates around 1,300 warehouses in the U.S. alone).

Agility Robotics

Digit

Businesses are assessing this humanoid in practical environments.

The new financial resources being channeled to robotics startups predominantly target robots designed not as lamps or arms, but resembling humans. Humanoid robots are intended to integrate smoothly into existing workspaces and jobs currently occupied by humans, negating the necessity to retrofit assembly lines for new configurations like large arms.

However, this is easier said than done. In the few instances where humanoids are seen in actual warehouses, they often remain confined to testing areas and pilot projects.

Digit humanoid robot placing a plastic bin on a conveyor belt — Amazon and other companies are utilizing Digit to assist in moving shipping containers.

That said, Agility’s humanoid, Digit, seems to be performing actual tasks. Its design—featuring exposed joints and a notably non-human head—leans more towards functionality than science-fiction aesthetics. Amazon, Toyota, and GXO (a logistics behemoth serving clients like Apple and Nike) have all deployed it—marking it as one of the initial instances of a humanoid robot perceived by companies as providing real cost advantages rather than mere novelty. The Digits are mainly occupied with lifting, transporting, and arranging shipping totes.

Currently, Digit remains far from the human-like assistant that Silicon Valley anticipates; for instance, it can lift only 35 pounds—with every enhancement making it stronger also resulting in a heavier battery that requires more frequent recharging. Safety organizations indicate that humanoids must adhere to stricter safety standards compared to most industrial robots, due to their mobility and potential proximity to humans.

Nonetheless, Digit illustrates that the transformation in robot training is not converging on a singular methodology. Agility employs simulation strategies akin to those utilized by OpenAI for its hand, and the company has collaborated with Google’s Gemini models to help its robots adapt to new settings. This is the ultimate outcome of over a decade of experimentation in the industry: it is expanding its scale.

Jibo