A set of tasks in 3D environments that require an agent to learn the meaning of words in a single shot and use those meanings to carry out object manipulation instructions.