Language is an essential human trait and the primary means by which we communicate information including thoughts, intentions, and feelings. Recent breakthroughs in AI research have led to the creation of conversational agents that are able to communicate with humans in nuanced ways. These agents are powered by large language models – computational systems trained on vast corpora of text-based materials to predict and produce text using advanced statistical techniques.
Yet, while language models such as InstructGPT, Gopher, and LaMDA have achieved record levels of performance across tasks such as translation, question-answering, and reading comprehension, these models have also been shown to exhibit a number of potential risks and failure modes. These include the production of toxic or discriminatory language and false or misleading information [1, 2, 3].
These shortcomings limit the productive use of conversational agents in applied settings and draw attention to the way in which they fall short of certain communicative ideals. To date, most approaches on the alignment of conversational agents have focused on anticipating and reducing the risks of harms .
Our new paper, In conversation with AI: aligning language models with human values, adopts a different approach, exploring what successful communication between a human and an artificial conversational agent might look like, and what values should guide these interactions across different conversational domains.
To address these issues, the paper draws upon pragmatics, a tradition in linguistics and philosophy, which holds that the purpose of a conversation, its context, and a set of related norms, all form an essential part of sound conversational practice.
Modelling conversation as a cooperative endeavour between two or more parties, the linguist and philosopher, Paul Grice, held that participants ought to:
However, our paper demonstrates that further refinement of these maxims is needed before they can be used to evaluate conversational agents, given variation in the goals and values embedded across different conversational domains.
By way of illustration, scientific investigation and communication is geared primarily toward understanding or predicting empirical phenomena. Given these goals, a conversational agent designed to assist scientific investigation would ideally only make statements whose veracity is confirmed by sufficient empirical evidence, or otherwise qualify its positions according to relevant confidence intervals.
For example, an agent reporting that, “At a distance of 4.246 light years, Proxima Centauri is the closest star to earth,” should do so only after the model underlying it has checked that the statement corresponds with the facts.
Yet, a conversational agent playing the role of a moderator in public political discourse may need to demonstrate quite different virtues. In this context, the goal is primarily to manage differences and enable productive cooperation in the life of a community. Therefore, the agent will need to foreground the democratic values of toleration, civility, and respect .
Moreover, these values explain why the generation of toxic or prejudicial speech by language models is often so problematic: the offending language fails to communicate equal respect for participants to the conversation, something that is a key value for the context in which the models are deployed. At the same time, scientific virtues, such as the comprehensive presentation of empirical data, may be less important in the context of public deliberation.
Finally, in the domain of creative storytelling, communicative exchange aims at novelty and originality, values that again differ significantly from those outlined above. In this context, greater latitude with make-believe may be appropriate, although it remains important to safeguard communities against malicious content produced under the guise of ‘creative uses’.
This research has a number of practical implications for the development of aligned conversational AI agents. To begin with, they will need to embody different traits depending on the contexts in which they are deployed: there is no one-size-fits-all account of language-model alignment. Instead, the appropriate mode and evaluative standards for an agent – including standards of truthfulness – will vary according to the context and purpose of a conversational exchange.
Additionally, conversational agents may also have the potential to cultivate more robust and respectful conversations over time, via a process that we refer to as context construction and elucidation. Even when a person is not aware of the values that govern a given conversational practice, the agent may still help the human understand these values by prefiguring them in conversation, making the course of communication deeper and more fruitful for the human speaker.