Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders and successfully won the majority vote. By optimising for human preferences, Democratic AI offers a proof of concept for value-aligned policy innovation.
In our recent paper, published in Nature Human Behaviour, we provide a proof-of-concept demonstration that deep reinforcement learning (RL) can be used to find economic policies that people will vote for by majority in a simple game. The paper thus addresses a key challenge in AI research - how to train AI systems that align with human values.
Imagine that a group of people decide to pool funds to make an investment. The investment pays off, and a profit is made. How should the proceeds be distributed? One simple strategy is to split the return equally among investors. But that might be unfair, because some people contributed more than others. Alternatively, we could pay everyone back in proportion to the size of their initial investment. That sounds fair, but what if people had different levels of assets to begin with? If two people contribute the same amount, but one is giving a fraction of their available funds, and the other is giving them all, should they receive the same share of the proceeds?
This question of how to redistribute resources in our economies and societies has long generated controversy among philosophers, economists and political scientists. Here, we use deep RL as a testbed to explore ways to address this problem.
To tackle this challenge, we created a simple game that involved four players. Each instance of the game was played over 10 rounds. On every round, each player was allocated funds, with the size of the endowment varying between players. Each player made a choice: they could keep those funds for themselves or invest them in a common pool. Invested funds were guaranteed to grow, but there was a risk, because players did not know how the proceeds would be shared out. Instead, they were told that for the first 10 rounds there was one referee (A) who was making the redistribution decisions, and for the second 10 rounds a different referee (B) took over. At the end of the game, they voted for either A or B, and played another game with this referee. Human players of the game were allowed to keep the proceeds of this final game, so they were incentivised to report their preference accurately.
In reality, one of the referees was a pre-defined redistribution policy, and the other was designed by our deep RL agent. To train the agent, we first recorded data from a large number of human groups and taught a neural network to copy how people played the game. This simulated population could generate limitless data, allowing us to use data-intensive machine learning methods to train the RL agent to maximise the votes of these “virtual” players. Having done so, we then recruited new human players, and pitted the AI-designed mechanism head-to-head against well-known baselines, such as a libertarian policy that returns funds to people in proportion to their contributions.
When we studied the votes of these new players, we found that the policy designed by deep RL was more popular than the baselines. In fact, when we ran a new experiment asking a fifth human player to take on the role of referee, and trained them to try and maximise votes, the policy implemented by this “human referee” was still less popular than that of our agent.
AI systems have been sometimes criticised for learning policies that may be incompatible with human values, and this problem of “value alignment” has become a major concern in AI research. One merit of our approach is that the AI learns directly to maximise the stated preferences (or votes) of a group of people. This approach may help ensure that AI systems are less likely to learn policies that are unsafe or unfair. In fact, when we analysed the policy that the AI had discovered, it incorporated a mixture of ideas that have previously been proposed by human thinkers and experts to solve the redistribution problem.
Firstly, the AI chose to redistribute funds to people in proportion to their relative rather than absolute contribution. This means that when redistributing funds, the agent accounted for each player’s initial means, as well as their willingness to contribute. Secondly, the AI system especially rewarded players whose relative contribution was more generous, perhaps encouraging others to do likewise. Importantly, the AI only discovered these policies by learning to maximise human votes. The method therefore ensures that humans remain “in the loop” and the AI produces human-compatible solutions.
By asking people to vote, we harnessed the principle of majoritarian democracy for deciding what people want. Despite its wide appeal, it is widely acknowledged that democracy comes with the caveat that the preferences of the majority are accounted for over those of the minority. In our study, we ensured that – like in most societies – that minority consisted of more generously endowed players. But more work is needed to understand how to trade off the relative preferences of majority and minority groups, by designing democratic systems that allow all voices to be heard.