The successor representation in human reinforcement learning

Theories of reinforcement learning in neuroscience have focused on two families of algorithms. Model-free algorithms cache action values, making them cheap but inflexible: a candidate mechanism for adaptive and maladaptive habits. Model-based algorithms achieve flexibility at computational expense, by rebuilding values from a model of the environment. We examine an intermediate class of algorithms, the successor representation (SR), which caches long-run state expectancies, blending model-free efficiency with model-based flexibility. Although previous reward revaluation studies distinguish model-free from model-based learning algorithms, such designs cannot discriminate between model-based and SR-based algorithms, both of which predict sensitivity to reward revaluation. However, changing the transition structure ('transition revaluation') should selectively impair revaluation for the SR. In two studies we provide evidence that humans are differentially sensitive to reward vs. transition revaluation, consistent with SR predictions. These results support a new neuro-computational mechanism for flexible choice, while introducing a subtler, more cognitive notion of habit.