Google researchers teach robots to learn by watching

Listen to this article

end effectors

Different robot end effectors.

Roboticists typically teach robots new tasks by making them work remotely while performing a task. The robot then mimics the demonstration until it can perform the task on its own.

While this method of teaching robots is effective, it limits demonstrations to laboratory settings, and only programmers and roboticists can demonstrate. A Research Team at Google’s robotics department has developed a new way for robots to learn.

Humans learn by watching all the time, but it’s not a simple task for robots. It is difficult for robots because they look different from humans. For example, a robot with a two-finger gripper won’t have much knowledge of how to pick up a pen while watching a human with a five-fingered hand pick one up.

To solve this problem, the team introduced a self-supervised method for cross-inverse reinforcement learning (XIRL).

This teaching method focuses on the robot learning the goal of the high-level task from videos. So instead of trying to match individual human actions with robot actions, the robot determines what its end goal is.

It then summarizes this information as an invariant reward function to physical differences such as shape, actions, and dynamics of end effectors. Using learned rewards and reinforcement learning, the research team taught robots how to manipulate objects through trial and error.

The bots learned more when the video examples were more diverse. The experiments showed that the team’s learning method led to efficient reinforcement learning of two to four times more samples on new embodiments.

The team made an open-source implementation of their method and X-MAGICAL, their mock benchmark for cross-embodiment imitation, to allow others to extend and build on their work.

X-MAGICAL was created to evaluate the performance of XIRL in a consistent environment. The program challenges a set of agent embodiments, which have different shapes and end-effectors, to perform a task. Agents perform tasks in different ways and at different speeds.


Demonstration of different forms performing a task in X-MAGICAL. | Source: Google

The team also taught using human demonstrations of real-world tasks. They used their method to train a simulated Sawyer arm to push a puck into a target area. Their teaching method also surpassed the basic methods.

The research team included Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson and Debidatta Dwibedi of robotics at Google, and Jeannette Bohg of Stanford University.

Comments are closed.