2026-06-04
National Data Administration: Advancing Embodied Intelligence Through Robust Data Engineering
Source:WWW.NEWS.CN

An article titled “National Data Administration: Advancing Embodied Intelligence Through Robust Data Engineering” was published in China Securities Journal on June 1. According to the article, on May 31, Liu Liehong, Director of the National Data Administration, delivered a speech at the 2026 World Intelligence Expo. He noted that high-quality datasets serve as a key foundation for the “Perception-Decision-Action (PDA)” loop of embodied intelligence. He also emphasized the need to advance embodied intelligence through robust data engineering and deepen systematic practical exploration.

Since the beginning of this year, a series of policy initiatives has been launched to support the development of high-quality datasets, and an industrial ecosystem centered on such datasets is beginning to take shape. According to experts, efforts to build high-quality datasets are shifting from “broad encouragement” to “standards-based practices, mechanism-driven pilot programs, and system-wide implementation”—a transition that is expected to further accelerate industry growth.

Visitors interact with robots at the 2026 World Intelligence Expo on May 30. (Photo by Xinhua News Agency)

Driving Data Supply Through Industrial Applications

“2026 marks the ‘Year of Unlocking the Value of Data as a Production Factor.’ The National Data Administration will introduce the Implementation Plan for Advancing High-Quality Dataset Development Across Industries, built around six key actions: strengthening foundations and expanding capacity, tackling annotation bottlenecks, improving quality and efficiency, enabling industrial applications, enhancing management services, and unlocking the value of data. Centered on the demands of AI-powered industrial development, the plan aims to drive data supply through industrial applications, harness data to fuel intelligent industrial growth, and give the ‘data flywheel’ across industries a stronger spin,” Liu Liehong said.

Speaking on the role of data in advancing AI-driven innovation, Liu Liehong noted that high-quality datasets serve as both a foundational resource and a key driver of innovation for the intelligent upgrading of advanced manufacturing. Data from production lines, equipment operations, and quality inspections should be systematically collected, governed, and utilized to better support industry-specific foundation models and intelligent agents in understanding the underlying principles of industrial operations, adapting to industrial scenarios, and optimizing workflows. He also called for greater investment in high-quality industry datasets to strengthen the synergy between data and AI models and promote the deeper integration of data, models, equipment, and industrial applications.

High-quality datasets are a key foundation for the PDA loop of embodied intelligence. Liu Liehong ed that the ability of embodied intelligence to adapt and perform tasks autonomously in real-world environments depends on high-quality multimodal training data, including visual, tactile, and audio data. He emphasized the need to advance embodied intelligence through robust data engineering and deepen systematic practical exploration.

High-quality datasets are a critical enabler of the accelerated development of AI for Science. Liu Liehong noted that scientific research requires exceptionally high standards of data accuracy, standardization, and reliability. High-quality datasets not only provide the foundation for training scientific models, discovering new patterns and principles, and validating research outcomes, but also serve as a key driver in bringing fundamental research into industrial applications and advancing the real-world adoption of AI for Science.

Since the beginning of this year, there have been a number of new developments in the field of high-quality datasets. On April 15, the National Data Administration released the Implementation Plan for Advancing High-Quality Dataset Development Across Industries (Draft for Comments) to seek public feedback. Recently, the Ministry of Industry and Information Technology and the National Data Administration jointly issued the Notice on the Joint Implementation of the 2026 “Model-Data Synergy” Initiative, aimed at promoting mutually reinforcing and aligned interactions between AI models and data resources. The initiative seeks to establish, by the end of 2026, a well-functioning data-model-application loop that will drive high-level AI empowerment of new industrialization.

The National Dataset Management Service Platform was launched for trial operation on April 29, offering public services throughout the dataset lifecycle. As of May 31, the platform had certified 516 organizations and released 1,350 datasets across key sectors, including agriculture, industrial manufacturing, transportation, and culture and tourism.

By the first quarter of this year, over 116,000 high-quality datasets had been created nationwide, with a total volume of more than 960 petabytes (PB). As of March, China’s daily token usage had exceeded 140 trillion.

High-Quality Dataset Development Across Multiple Regions

Since the beginning of this year, multiple regions have actively proposed the development of high-quality data sets.

According to the Special Action Plan of Shandong Province for High-Quality Dataset Development Across Industries issued by the Shandong Provincial Big Data Bureau, Shandong Province aims to develop about two specialized datasets in each of 16 key sectors, including industrial manufacturing and transportation, by the end of 2026. By the end of 2027, the province plans to have established a cumulative total of 50 high-quality datasets. The plan also outlines specific measures to expand public data supply, accelerate enterprise data development, strengthen data supply-demand matching, and promote the growth of the data annotation industry. In addition, to implement national initiatives aimed at enhancing data utilization effectiveness in state-owned enterprises, the Guangdong Provincial Government Services and Data Management Bureau, together with the Guangdong Provincial State-owned Assets Supervision and Administration Commission, recently launched the Guangdong Provincial Initiative to Improve the Quality and Effectiveness of State-owned Enterprise Data.

Zong Jianshu, Chief Analyst for the Computer Industry at Changjiang Securities, noted that China’s large-model industry is currently experiencing sustained and rapid growth. As foundational resources for training and optimizing large models, datasets play a critical role, with their quality and diversity directly affecting model performance and outcomes. High-quality datasets, as key production inputs for the industrial deployment of AI, are expected to serve as a central hub linking industry scenarios, model training, intelligent agent applications, and the realization of data value. Efforts to build high-quality datasets are shifting from “broad encouragement” to “standards-based practices, mechanism-driven pilot programs, and system-wide implementation”—a transition that is expected to further accelerate industry growth.

According to a research report from CCW Research, the large-scale development of high-quality datasets is expected to further drive rapid growth in three software segments with a market size of tens of billions of yuan: high-quality industry dataset construction and related services, industry knowledge graphs and intelligent agent knowledge bases, and synthetic data generation and data privacy protection platforms, injecting new growth momentum into China’s software industry.

(Source: WWW.NEWS.CN)