AI Model Achieves Real-time Generation of 3D Images from 2D Samples, Leading the Field of 3D Visualization

2023-11-14

In the rapidly evolving field of large-scale computing, there is a groundbreaking breakthrough that is set to revolutionize the field of 3D visualization. Adobe Research and the Australian National University (ANU) have announced the first artificial intelligence model capable of generating 3D images from a single 2D image. According to the researchers, their new algorithm, based on training a large-scale image sample, can generate such 3D images in a matter of seconds, completely changing the way 3D models are created. Hong Yicong, a former graduate student at ANU's College of Engineering, Computing and Control and current intern at Adobe, said their Large Reconstruction Model (LRM) is based on a highly scalable neural network that includes one million datasets and 500 million parameters. These datasets include images, 3D shapes, and videos. "The combination of high-capacity models and large-scale training data gives our model a high degree of generalization and the ability to generate high-quality 3D reconstruction effects from various test inputs. To our knowledge, our LRM is the first large-scale 3D reconstruction model," said Hong Yicong, the lead author of the project report. Augmented reality and virtual reality systems, gaming, film animation, and industrial design are expected to fully leverage this transformative technology. Early 3D imaging software performed well only in specific subject categories with pre-established shapes. Hong Yicong explained that progress in image generation was later achieved by programs such as DALL-E and Stable Diffusion, which utilized the significant generalization capabilities of 2D diffusion models to achieve multi-perspective effects. However, the results of these programs were limited to pre-trained 2D generation models. Other systems achieved impressive results by optimizing for shapes, but Hong Yicong said these systems were often "slow and impractical." Hong Yicong stated that the development of natural language models that maximize the next word prediction task using large-scale data inspired their team to pose the question, "Is it possible to learn general 3D prior knowledge of an object from a single image?" Their answer was "yes." Hong Yicong said, "LRM can reconstruct high-fidelity 3D shapes from various images captured in the real world as well as images generated by the model. Due to the absence of post-optimization, LRM is also a highly practical solution in downstream applications, capable of generating 3D shapes in just five seconds." The success of this program lies in its ability to leverage its database of millions of image parameters and predict neural radiance fields (NeRF). This means it can generate realistic 3D images based on just 2D images, even at lower resolutions. NeRF has functionalities in image synthesis, object detection, and image segmentation. Sixty years ago, the first computer program that allowed users to generate and manipulate simple 3D shapes was born. Ivan Sutherland of Stanford University designed Sketchpad in his doctoral thesis, which had a total of 64K of memory. Over the decades, 3D programs have seen rapid development, including software such as AutoCAD, 3D Studio, SoftImage 3D, RenderMan, and Maya.