banner

News

Oct 16, 2024

Nvidia's Eureka helps robot dog perfect yoga ball balance

DrEureka is an LLM agent crafting code to train robots in simulations, bridging the simulation-reality gap.

Jijo Malayil

DrEureka's policy demonstrates remarkable real-world robustness, skillfully balancing and walking on a yoga ball despite unpredictable terrain changes.

Jason Ma/Youtube

Researchers have utilized Nvidia’s Eureka platform, a human-level reward design algorithm, to train a quadruped robot to balance and walk on top of a yoga ball.

Derived from the platform, DrEureka is a large language model (LLM) agent specialized in crafting code to train robots’ skills within simulations and to develop solutions that overcome the challenges of the simulation-reality gap.

Researchers claim that it operates seamlessly, automating the entire process from initial skill acquisition to real-world implementation. This ensures a smooth transition from virtual environments to practical deployment.

The team used the platform to train the robot dog in simulation and then transferred it to real work conditions. The quadruped completed the task in the first attempt itself, and no fine-tuning was required.

The details of the study from the team of researchers from the University of Pennsylvania, University of Texas at Austin, and Nvidia were published on GitHub.

Researchers highlight that leveraging policies acquired in simulation for real-world applications holds significant promise in scaling up robot skill acquisition.

Nonetheless, sim-to-real methodologies often necessitate manual configuration and adjustment of task reward functions and simulation physics parameters, leading to slow progress and requiring substantial human effort.

“Traditionally, the sim-to-real transfer is achieved by domain randomization, a tedious process that requires expert human roboticists to stare at every parameter and adjust by hand,” said Jim Fan, senior research manager & lead of embodied AI at Nvidia, in a post on X.

DrEureka starts by taking task and safety instructions, along with the environment source code, to initiate Eureka. Eureka then produces a standardized reward function and policy. These are tested across various simulation conditions to develop a physics prior that’s sensitive to rewards.

This is then utilized by the LLM to generate a range of domain randomization (DR) parameters. Finally, leveraging the synthesized reward and DR parameters, DrEureka trains policies ready for real-world deployment.

Cutting-edge LLMs such as GPT-4 come equipped with an extensive built-in understanding of physical concepts like friction, damping, stiffness, gravity, and more. “We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well,” said Fan.

Assessing quadrupedal locomotion, the team systematically tested DrEureka’s policies across various real-world terrains.

Results show their robustness and superior performance compared to policies trained with manually designed reward and domain randomization settings.

“DrEureka policy exhibits impressive robustness in the real world, adeptly balancing and walking atop a yoga ball under various real-world, un-controlled terrain condition changes and disturbances,” said the researchers in the study.

Furthermore, the enhancement of DrEureka’s LLM reward design subroutine surpasses Eureka’s capabilities by integrating safety instructions. Researchers assert its importance in crafting reward functions sufficiently safe for real-world deployment.

Key findings reveal the significance of leveraging the initial Eureka policy for creating reward-aware physics prior to DrEureka’s success. Additionally, utilizing the LLM to sample domain randomization parameters is vital for optimizing real-world performance.

Looking ahead, researchers say there are numerous ways to enhance DrEureka further. For instance, presently, DrEureka policies are solely trained in simulation, but using real-world failures as feedback could help LLMs better fine-tune sim-to-real methods in subsequent iterations.

Additionally, all tasks and policies in the study relied solely on the robot’s internal sensory inputs, and integrating vision or other sensors could enhance policy performance and the LLM feedback loop.

Stay up-to-date on engineering, tech, space, and science news with The Blueprint.

By clicking sign up, you confirm that you accept this site's Terms of Use and Privacy Policy

Jijo Malayil Jijo is an automotive and business journalist based in India. Armed with a BA in History (Honors) from St. Stephen's College, Delhi University, and a PG diploma in Journalism from the Indian Institute of Mass Communication, Delhi, he has worked for news agencies, national newspapers, and automotive magazines. In his spare time, he likes to go off-roading, engage in political discourse, travel, and teach languages.

a day ago

a day ago

a day ago

a day ago

Jijo Malayil
SHARE