Sora Sora

Around this time a year ago, Stable Diffusion launched an early text-to-animation tool and the internet was obsessing over an AI-generated Seinfeld knockoff. This week, OpenAI has demonstrated how far animation-generating technology has come over the past 12 months with the unveiling of its new model Sora, which can create photorealistic and cg-style animation of up to one minute based on text prompts.

What is Sora? Sora is a generative AI model that uses text and image prompts to create videos and images of varying durations, aspect ratios, and resolutions, up to a full minute of high-definition video. The model can also take an existing video and extend it or fill in missing frames.

According to OpenAI:

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt but also how those things exist in the physical world.

How does it work? Sora was built on past Dall-E and GPT models. It borrows Dall-E 3’s recaptioning technique in which the model generates highly descriptive captions for training data. When users enter prompts with similar language, the model knows to use corresponding training data when creating a new image or, in Sora’s case, a video. The Sora team says, as a result, “the model is able to follow the user’s text instructions in the generated video more faithfully.”

A more detailed explanation is available in a technical report published this week by OpenAI, available here.

What makes Sora different than other animation-generating models? According to OpenAI, Sora stands out for the impressive photorealism and length of the generated videos as well as its ability to closely adhere to user prompts. Sora can do this because the model plans out its videos many frames at a time, giving it greater foresight than previous programs and allowing it to remember what characters and objects look like through the duration of the generated videos, even if they leave the frame temporarily.

What are Sora’s shortcomings? Sora can struggle to accurately simulate physics in more complex scenes. OpenAI also says the model doesn’t have a strong understanding of cause and effect, which can create undesired or unrealistic outcomes. The model sometimes becomes confused regarding spatial details that require perspective, such as which way is up or down, left or right, and it can have trouble with timelines described by specific prompts.

How is OpenAI preventing this software from being used in harmful ways? The company says it is working with experts in misinformation, hateful content, and bias to test the model and build tools to detect misleading content, including a “detection classifier” that will indicate if a video was made by Sora. OpenAI says that Sora will have similar limitations to the company’s Dall-E software, which doesn’t allow prompts soliciting violence, sexual content, hateful imagery, representations of real people, or IPs belonging to other parties.

When will Sora be available to the public? Sora is currently only available to a few select creators and security experts who are checking it for safety vulnerabilities. OpenAI plans to make the model available to the public sometime in the future.

Examples: On the Sora website and accompanying technical report, numerous example videos and their corresponding prompts are listed. We’ve included a few here, but there are dozens more.

Prompt: A cartoon kangaroo disco dances.
Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
Prompt: A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Sora paper authors:
Tim Brooks
Bill Peebles
Connor Holmes
Will DePue
Yufei Guo
Li Jing
David Schnurr
Joe Taylor
Troy Luhman
Eric Luhman
Clarence Wing Yin Ng
Ricky Wang
Aditya Ramesh

Software/Tech:

Jamie Lang

Jamie Lang is the Editor-in-Chief of Cartoon Brew.