Side effects are unnecessary disruptions to the agent's environment while completing a task. Instead of trying to explicitly penalize all possible side effects, we give the agent a general penalty for impacting the environment, defined as a deviation from some baseline state. For example, a reversibility penalty measures unreachability (deviation) of the starting state (baseline). This code implements a tabular Q-learning agent with different impact penalties. Each penalty consists of a deviation measure (none, unreachability, relative reachability, or attainable utility), a baseline (starting state, inaction, or stepwise inaction), and some other design choices.

No items found.