Technical blog

dm_control: Software and Tasks for Continuous Control

June 15, 2020
Event
Technical blog

dm_control: Software and Tasks for Continuous Control

June 15, 2020

Overview

A public colab notebook with a tutorial for dm_control software is available here.

Infrastructure
  • An autogenerated MuJoCo Python wrapper provides full access to the underlying engine.
  • PyMJCF is a Document Object Model, wherein a hierarchy of Python Entity objects corresponds to MuJoCo model elements.
  • Composer is the high-level “game engine” which streamlines the composing of Entities into scenes and the defining observations, rewards, terminations and general game logic.
  • The Locomotion framework introduces several abstract Composer entities such as the Arena and Walker, facilitating locomotion-like tasks.
Environments
  • The Control Suite, including a new quadruped and dog environment.
  • Several locomotion tasks, including soccer.
  • Single arm robotic manipulation tasks using snap-together bricks.

Highlights

Named Indexing

Exploiting MuJoCo’s support of names for all model elements, we allow strings to index and slice into arrays. So instead of writing:

"fingertip_height = physics.data.geom_xpos[7, 2]"

...using obscure, fragile numerical indexing, you can write:

"fingertip_height = physics.named.data.geom_xpos['fingertip', 'z']"

leading to a much more robust, readable codebase.

PyMJCF

The PyMJCF library creates a Python object hierarchy with 1:1 correspondence to a MuJoCo model. It introduces the attach() method which allows models to be attached to one another. For example, in our tutorial we create procedural multi-legged creatures by attaching legs to bodies and creatures to the scene.

Composer

Composer is the “game engine“ framework, which defines a particular order of runtime function calls, and abstracts the affordances of reward, termination and observation. These abstractions allowed us to create useful submodules:

composer.Observable: An abstract observation wrapper which can add noise, delays, buffering and filtering to any sensor.

composer.Variation: A set of tools for randomising simulation quantities, allowing for agent robustification and sim-to-real via model variation.

Diagram showing the life-cycle of Composer callbacks. Rounded rectangles represent callbacks that Tasks and Entities may implement. Blue rectangles represent built-in Composer operations.

Locomotion

The Locomotion framework introduced the abstractions:

Walker: A controllable entity with common locomotion-related methods, like projection of vectors into an egocentric frame.

Arena: A self-scaling randomised scene, in which the walker can be placed and given a task to perform.

For example, using just 4 function calls, we can instantiate a humanoid walker, a WallsCorridor arena and combine them in a RunThroughCorridor task.

New Control Suite domains

Quadruped
  • A generic quadruped domain with a passively stable body.
  • Several pure locomotion tasks (e.g. walk, run).
  • An escape task requiring rough terrain navigation.
  • A fetch task requiring ball dribbling.
Dog
  • An elaborate model based on a skeleton commissioned from leo3Dmodels.
  • A challenging ball-fetching task that requires precision grasping with the mouth.
Showcase

A fast-paced montage of dm_control based tasks from DeepMind:

Technical blog
From motor control to embodied intelligence
Research
Reinforcement learning with unsupervised auxiliary tasks
Research
Tackling multiple tasks with a single visual language model