
On Tuesday, OpenAI revealed Sora 2, its advanced video-synthesis AI model capable of producing videos in different styles complete with synchronized dialogue and sound effects, marking a first for the company. OpenAI also introduced a new iOS social application enabling users to insert themselves into AI-generated videos through what OpenAI refers to as “cameos.”
OpenAI displayed the new model in an AI-generated video featuring a lifelike rendition of OpenAI CEO Sam Altman speaking to the camera with a somewhat unnatural voice against surreal backdrops, such as a competitive ride-on duck race and a luminescent mushroom garden.
As for that voice, the new model can produce what OpenAI describes as “advanced background soundscapes, speech, and sound effects with a high level of realism.” In May, Google’s Veo 3 became the inaugural video-synthesis model from a leading AI laboratory to create synchronized audio along with video. Just a few days prior, Alibaba launched Wan 2.5, an open-weights video model that is also capable of generating audio. Now, OpenAI has entered the audio realm with Sora 2.
OpenAI showcases Sora 2’s functions in a launch video.
The model also shows significant visual consistency improvements compared to OpenAI’s earlier video model and is capable of following more intricate instructions across multiple shots while maintaining coherence between them. The new model signifies what OpenAI refers to as its “GPT-3.5 moment for video,” likening it to the ChatGPT leap during the progression of its text-generation models over time.
Sora 2 seems to exhibit enhanced physical accuracy over the original Sora model from February 2024, with OpenAI asserting that the model can now replicate complex physical movements, such as Olympic gymnastics routines and triple axels, while adhering to realistic physics. Last year, shortly after the release of Sora 1 Turbo, we witnessed several noteworthy shortcomings in similar video-generation tasks that OpenAI claims to have rectified with the new model.
“Earlier video models are overly optimistic—they will distort objects and alter reality to fulfill a text prompt,” OpenAI mentioned in its announcement. “For instance, if a basketball player misses a shot, the ball might suddenly teleport into the hoop. In Sora 2, if a basketball player misses a shot, it will bounce off the backboard.”