2025-08-25
Academician Ni Guangnan of the Chinese Academy of Engineering: Building a Brain-Eye-Action Coordinated System to Enhance Robotic Intelligence
Source:People’s Posts and Telecommunications News (RENMIN YOUDIAN)
Recently, Academician Ni Guangnan of the Chinese Academy of Engineering delivered a keynote speech titled “‘AI+Spatial Computing’ Enables Machines to Understand the World” at the Main Forum of the 2025 World Robot Conference. He noted that artificial intelligence is currently driving a paradigm shift in science and technology. Against the backdrop of the “AI+” initiative, it is essential to raise the level of robotic intelligence so that “AI+Robot” can better realize the role of robots as a new quality productive force.
Ni Guangnan said that “AI+Spatial Computing” has ushered in a new paradigm for the development of two-dimensional interaction towards three-dimensional interaction. “AI+Spatial Computing” is a key core technology for the implementation of the “AI+” initiative. As a new computing method, spatial computing serves as a bridge between the physical world and the digital world, reshaping the way humans, machines, and the world interact. It is one of the key core technologies driving the practical application of robotics.
Ni Guangnan observed that, although currently generative large language models (LLMs) are leading technological development, the knowledge they acquire does not fully encompass the physical world. Generative AI alone cannot capture the full complexity of physical reality. The integration of spatial computing and AI is reconstructing the three-dimensional physical world in a novel way, extending a bridge for large models to connect with the physical world and facilitating the fusion of digital and physical realms.
Quoting Turing Award laureate Professor Yann LeCun, he noted: “The information a large model learns amounts to about 101? units, equivalent to the total volume of all publicly available text on the internet. For humans, it would take hundreds of thousands of years to acquire such information. However, a four-year-old child can also acquire about 101? units of information just by opening their eyes.” This illustrates that text-based training data for LLMs is insufficient.
“To understand and comprehend the world, vast amounts of video information are also needed,” Ni Guangnan remarked. “Vision is the starting point of intelligence, the basis of perception in the physical world, and the bridge linking the brain with physical reality. Therefore, we must attach great importance to the role of the ‘eyes’.”
Using the example of robotics working in the factory, he explained that what a robot “sees” is broadly similar to human visual perception. Through “AI+Spatial Computing”, robots can interpret and model what they observe. “In the past, the digital world merely mirrored the physical world. Currently, the task is to integrate the digital world and the physical world,” Ni Guangnan said.
“Overall, we hope that people, goods, and machines will work together on factory production lines in the future, complementing each other's strengths and weaknesses,” Ni Guangnan said.
Discussing the evolution of robotics, from automated tools to intelligent entities, Ni Guangnan observed that today’s robots are controlled by embodied intelligence systems. China’s robot industry has enormous potential for development. Faced with a trillion-yuan-scale robot industry, Ni Guangnan stated that we must enhance the intelligence of robots to fully leverage their role as a new quality productive force.
Ni Guangnan provided a detailed explanation of how robots achieve intelligence. He stated that a robot’s intelligence generally consists of three main components: the brain, the eyes, and the action system, which together form an embodied intelligent system. “However, at present, our investment in the ‘brain’ and ‘eyes’ of robots is insufficient, and there are shortcomings in this area. We need to increase investment in these areas to better demonstrate the effectiveness of robots,” Ni Guangnan emphasized.
He cited autonomous driving as an example, noting that it is divided into levels L1 to L5, and robots follow a similar pattern. At present, most robots are generally at the L1-L3 stages. “We hope that through the development of the ‘eyes’ and the ‘brain’, the overall level of robotic intelligence can be advanced to L4 or above. Only then can robots truly demonstrate their mobility, autonomy, and high precision, and truly play their role in improving production efficiency.” In the future, robots will not only be operating systems, but embodied intelligent agents that incorporate the “brain,” “eyes”, and “coordinated actions.” With the support of such a system, industrial automation robots can further evolve into the current “AI+Robot” era of artificial intelligence, achieving greater effectiveness.
Ni Guangnan stressed that the key to achieving this goal lies in enhancing the level of robotic intelligence. “We must use a brain-eye-action coordinated system to improve robotic intelligence, so that robots can truly see the world, understand the world, and act within the world,” Ni said.
However, for robots to evolve from automated tools into “AI + Robot”, a complex process must be undertaken. First, robots must possess their own “brain” based on large models. Second, the intelligent system of robots should highlight the role of the “eyes.” The solution of “AI+Spatial Computing” is to use ordinary monocular cameras combined with neural network learning, enabling robots to acquire human-like vision with strong adaptability. Finally, as robots need to interact with many entities, an operating system is required. Ni suggested using the open-source AGIROS to support “robotic actions”, thereby comprehensively advancing the development of the intelligent robot sector.