Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

In this paper, we contribute a multi-faceted study into what we define as Pavlovian signalling---a process by which learned, temporally extended predictions made by one agent are mapped to features that inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We here investigate how different temporal processes impact coordination and signalling between machine learning agents by introducing a decision-making domain we call The Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We evaluated two domain variations: machine agents interacting in an seven-state linear walk, and human-machine interaction in a virtual-reality environment. Our results showcase the impact that different temporal representations have on agent-agent coordination, and highlight how temporal aliasing impacts agent-agent and human-agent interactions in different ways. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore suggest an actionable, constructivist path towards communication learning between reinforcement learning agents.

Authors' notes