Gamma-Nets: Generalizing Value Estimation over Timescale

Effective life-long deployment of an autonomous agent
in a complex environment demands that the agent has
some model of itself and its environment. Such models are inherently predictive, allowing an agent to predict the consequences of its actions. In this paper, we
demonstrate the use of General Value Functions (GVFs)
for learning and representing such a predictive model
on a robotic arm. Our model is composed of three types
of signals: (1) predictions of sensorimotor signals, (2)
measures of surprise using Unexpected Demon Error
(UDE) and (3) predictions of surprise. In a proof-ofprinciple experiment, where the robot arm is manually
perturbed in a recurring pattern, we show that each perturbation is detected as a jump in the surprise signal. We
demonstrate that the recurrence of these perturbations
not only can be learned, but can be anticipated. We propose that introspective signals like surprise and predictions of surprise might serve as a rich substrate for more
abstract predictive models, improving an agent’s ability
to continually and independently learn about itself and
its environment to fulfill its goals.

Authors' notes