“Physical AI After Generative AI”…Synthetic Data Emerges as Critical Global AI Infrastructure

As the era of “Physical AI” — AI capable of perceiving, reasoning, and acting in real-world environments — begins to accelerate beyond generative AI, competition within the global AI industry is increasingly shifting toward data infrastructure. To deploy AI across real-world applications such as robotics, autonomous driving, and smart factories, securing training data that closely replicates physical environments has become essential, placing synthetic data at the center of next-generation AI infrastructure.
Synthetic data refers to AI training data generated within digital twin-based virtual environments, rather than repeatedly collecting real-world data. Beyond simple 3D modeling, the key lies in reproducing real-world physical elements — including materials, surface textures, reflectivity, lighting conditions, and sensor noise — to create industrial-grade training environments that closely mirror reality.
Industry experts note that repeatedly training robots in physical environments faces limitations in terms of cost, time, and the acquisition of rare or exceptional-case data. As a result, the “Sim-to-Real” approach — where AI models are first trained and validated in virtual environments before being deployed in the real world — is expected to become increasingly widespread. Consequently, the importance of “Simulation-Ready Data,” which can be directly applied to robotics, autonomous driving, and smart factory environments, is growing beyond simply increasing data volume.
A representative example is the collaboration between global industrial automation company ABB and NVIDIA. ABB Robotics has established a partnership with NVIDIA to develop next-generation autonomous industrial robots and recently unveiled plans to build an industrial AI simulation environment integrating its RobotStudio simulation platform with NVIDIA Omniverse.
ABB is advancing a structure in which industrial robots can learn and validate tasks in virtual environments before deployment in actual factories by combining RobotStudio — its robot design, programming, and simulation software — with NVIDIA Omniverse. NVIDIA is also focusing on developing Sim-to-Real technologies that reduce discrepancies between virtual and real environments through its Omniverse and Isaac Sim-based physics simulation technologies.
Industry observers interpret the collaboration not merely as a technology partnership, but as a strategic move to secure leadership in the synthetic data and simulation ecosystem — regarded as core infrastructure for the Physical AI era. The race is intensifying to develop technologies that enable AI models trained in virtual environments to operate reliably in real industrial settings such as manufacturing and logistics.
In South Korea, momentum is also growing around the synthetic data market for Physical AI training. However, industrial synthetic data — which requires reproducing factory layouts, lighting conditions, object interactions, robot movement paths, and physics-based simulations — is considered a highly demanding technological field.
Among the emerging players, SKAI Intelligence is gaining attention as an industrial synthetic data company built on NVIDIA Omniverse. The company recently established a dedicated R&D center to advance digital twin and synthetic data technologies while strengthening capabilities in Real-to-Sim and Sim-to-Real architecture design.
SKAI Intelligence is focused on advancing synthetic data infrastructure that integrates industrial layouts, object interactions, robot movement paths, and physics-based simulations beyond simple 3D data generation. Other Korean companies, including Xiilab, N.LIGHT, and CrowdWorks, are also accelerating their entry into the market based on their capabilities in digital twins, 3D CAD, and AI data development.
Market growth is also expected to accelerate rapidly. According to global market research firm Grand View Research, the global synthetic data market is projected to grow from USD 218.4 million in 2023 to USD 1.788 billion by 2030. As Physical AI adoption expands across robotics, autonomous driving, and smart factories, demand for industrial synthetic data is expected to rise significantly.
An industry official stated, “In the era of Physical AI, synthetic data serves as critical training infrastructure that allows robots to experience countless exceptional scenarios before deployment in real-world environments,” adding, “Future competitiveness will depend on how precisely companies can digitize reality and connect that data to AI training and real-world performance improvements.”