July 13, 2023
Filmmakers may soon be able to stabilize shaky video, change viewpoints and create freeze-frame, zoom, and slow-motion effects – without shooting any new footage – thanks to an algorithm developed by researchers at Cornell University and Google Research.
The software, called DynIBar, synthesizes new views using pixel information from the original video, and even works with moving objects and unstable camerawork. The work is a major advance over previous efforts, which yielded only a few seconds of video, and often rendered moving subjects as blurry or glitchy.
Noah Snavely, a research scientist at Google Research and associate professor of computer science at Cornell Tech and in the Cornell Ann S. Bowers College of Computing and Information Science, presented this work, “DynIBaR: Neural Dynamic Image-Based Rendering,” on June 20 at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, where it received an honorable mention for the best paper award. Zhengqi Li Ph.D ’21, of Google Research, was the lead author on the study.
“Over the last few years, we’ve seen major progress in view synthesis methods – algorithms that can take a collection of images capturing a scene from a discrete set of viewpoints, and can render new views of that scene,” said Snavely. “However, most of these methods fail on scenes with moving people or pets, swaying trees, and so on. This is a big problem because many interesting things in the world are things that move.”
Existing methods to render new views of still scenes, such as ones that make a photo appear 3D, take the 2D grid of pixels from an image and reconstruct the 3D shape and appearance of each object in the photo. DynIBar takes this a step further by also estimating how the objects move over time. But considering all four dimensions creates an incredibly difficult math problem.
The researchers simplified this problem by using a computer graphics approach developed in the 1990s called image-based rendering. At the time, it was difficult for traditional computer graphics methods to render complex scenes with many small parts – such as a leafy tree – so graphics researchers developed methods that take images of a scene and then alter and recombine the parts to generate new images. In this way, most of the complexity was stored within the source image and would load faster.
“We incorporated the classic idea of image-based rendering, and that makes our method able to handle really complex scenes and longer videos,” said co-author Qianqian Wang, a doctoral student in the field of computer science at Cornell Tech. Wang developed a method to use image-based rendering to synthesize new views of still images, which the new software builds on.
DynIBar synthesizes new video views in both time and space. It can center and smooth out jerky footage and fill in frames to add a slow-motion effect. It also lets the user zoom in on a frozen subject – the time-slice method or "bullet time effect" featured in the movie The Matrix. Additionally, users can change which parts of the image are in focus and which are blurred to create video bokeh effects, and even create stereoscopic videos for 3D viewing glasses.
Despite the recent advance, these features may not be coming to your smartphone any time soon. The software takes several hours to process just 10 or 20 seconds of video, even on a powerful computer.
In the near-term, the technology may be more appropriate for use in offline video editing software, Snavely said. “While this research is still in its early days, I’m really excited about potential future applications for both personal and professional use.”
The next hurdle will be figuring out how to render new images when pixel information is lacking from the original video, such as when the subject moves too fast or the user wants to rotate the viewpoint 180 degrees. Snavely and Wang envision that soon it may be possible to incorporate generative AI techniques, such as text-to-image generators, to help fill in those gaps.
Forrester Cole and Richard Tucker from Google Research also contributed to the research.
Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.