Is OpenAI’s Sora Threatening Filmmaking?

Sora: AI text-to-video generation has arrived. But is it really as big a threat as you think?

By Lewis McGregor February 15, 2024 6 min read

From 2021 until the final day of 2022, I worked full-time for Shutterstock. As a filmmaker-photographer hybrid working for one of the largest stock photo and video platforms on the internet, I thought it would be the perfect opportunity to become a stock photographer/video contributor on a grand scale. It wasn’t against company policy; in fact, it was encouraged to contribute to the library. Knowing what data was being searched for and having a spare room with enough production equipment to start an HBO television series, I felt like I could be the next big contributor.

In hindsight, that was never going to happen, considering that one of the largest portfolios on Shutterstock has over 400,000 video clips filmed with RED and ARRI’s, but still, the dream was there. However, this was turned on its head when Midjourney began making noise through the ether. Although it was in its infancy and incapable of creating anything like it can today, I foresaw that stock photography was about to become obsolete. With Shutterstock and other agencies implementing AI into their libraries as a creator, it felt fruitless to even go through the tedious process of contributing content anymore.

Throughout 2023, while image generation improved, and copy content became… dumber? Video generation still seemed far off, even though new video generation platforms were launched every other month. The results were less than satisfactory. I mean, I’ve been creating tutorials on these tools for the better part of the year, and often, they are blurry, choppy, and disjointed—useful more for abstract and surreal images than anything worth putting in actual video content. That is, until today. Perhaps.

Sora

San Francisco startup OpenAI, the company behind ChatGPT, has unveiled a similar system that creates videos that look like they were lifted from an A-tier production. It’s called Sora. Unlike the dream-like surrealist generations we’ve seen in the past, Sora is distinguished by its photorealism and its ability to produce one-minute clips, whereas other models are usually restrained to just a few seconds.

We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction. Introducing Sora, our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.
OpenAI

In this scene, the text prompt is ‘Historical footage of California during the gold rush.’ It’s… dare I say it? Outstanding.

Prompt: Historical footage of California during the gold rush.

A few years ago, I attempted to produce a web series called The Vagabond; it was set in an apocalyptic future, and we ran out of cash and steam before we could finish the final episodes. Only a few were released online. I can only imagine how this technology could have enhanced the production value of such a series from a no-budget standpoint. Maybe it could have been finished?

However, it’s not perfect. Only when you watch the clip a few times do you realize things aren’t as accurate as they seem at first glance. The horses move oddly, people seem to float, and the perspective of some buildings is off. In fact, some visible people seem to be half horse, half human. But given the attention span of the modern audience, who often seem okay with half-baked CGI, would this even be an issue?

The concern that I and like-minded colleagues have is just how much of the filmmaking industry this will affect. Again, some of the physics in the example scene of Big Sur seems a little off, but how much of that is noticeable in a mobile-first world?

Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.

OpenAI says

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

On their landing page, OpenAI does point out the current flaws within the AI generation, and as such, I’m not too sure that text-to-video technology, such as Sora, is likely to kill traditional filmmaking and VFX anytime soon. Quite like DALL-E and Midjourney, creating coherent prompts is a very tedious and difficult task, and I can only imagine the difficulty in trying to merge an hour’s worth of Sora’s one-minute clips into a feature.

On the other hand, still-image AI technologies like DALL-E and Midjourney have rapidly advanced, now creating visuals almost indistinguishable from real photographs. This progress prompts speculation about the future impact on VFX artists and drone companies, questioning if it’s only a matter of time before they face significant competition from AI like Sora.

Like many other troubled millennials, I turned to the beacon of information that is Reddit to delve deeper into how others felt. I came across a comment in the VFX subreddit that resonated with me.

The author discusses the broader issue of technological advancements not always translating into practical tools for production, based on their experience directing commercials using stock footage during COVID. This experience underlines the challenges and inefficiencies compared to traditional filmmaking practices.

They also suggest that if the filmmaking community is impacted by AI, it will likely be an unintended consequence, emphasizing that AI development often overlooks the specific needs and tools required for professional VFX and filmmaking. They acknowledge that while AI will generate a lot of content, especially for low-stakes applications like lifestyle ads, it falls short of high-quality, bespoke video production that understands and meets specific aesthetic criteria.

To add to the author’s points, I think there’s one aspect many advocates, and even those concerned, have overlooked: copyright. Many, including MKB, are quick to highlight how drone operators could face steep competition. However, they often overlook the issue of copyright. As noted last year, a US Court ruled that AI-generated content is not protected by copyright. Therefore, even if Sora advances to the point where it can generate a completely photorealistic rendering of the latest Ford Bronco cruising through Oklahoma’s salt plains, nothing would prevent a content creator from generating the exact same shot for use in their media without any legal implications.

I know it’s currently scary to be a creative, but as someone who has experimented with generative art over the last year using the most popular models, I don’t think there’s much to worry about until it can consistently create sequential frames in response to an additional prompt. Likewise, considering the current movie-going audience’s ability to point out bad CGI and their preference for practical effects, I can’t see the general audience taking too well to an AI-generated scene or composition.

Posted by Lewis McGregor

Lewis McGregor is a filmmaker, photographer and online content creator from Wales.