Jump to Content

Responsibility & Safety

Language modelling at scale: Gopher, ethical considerations, and retrieval

Published
Authors

Jack Rae, Geoffrey Irving, Laura Weidinger

Language, and its role in demonstrating and facilitating comprehension - or intelligence - is a fundamental part of being human. It gives people the ability to communicate thoughts and concepts, express ideas, create memories, and build mutual understanding. These are foundational parts of social intelligence. It’s why our teams at DeepMind study aspects of language processing and communication, both in artificial agents and in humans.

As part of a broader portfolio of AI research, we believe the development and study of more powerful language models – systems that predict and generate text – have tremendous potential for building advanced AI systems that can be used safely and efficiently to summarise information, provide expert advice and follow instructions via natural language. Developing beneficial language models requires research into their potential impacts, including the risks they pose. This includes collaboration between experts from varied backgrounds to thoughtfully anticipate and address the challenges that training algorithms on existing datasets can create.

Today we are releasing three papers on language models that reflect this interdisciplinary approach. They include a detailed study of a 280 billion parameter transformer language model called Gopher, a study of ethical and social risks associated with large language models, and a paper investigating a new architecture with better training efficiency.

Gopher - A 280 billion parameter language model

In the quest to explore language models and develop new ones, we trained a series of transformer language models of different sizes, ranging from 44 million parameters to 280 billion parameters (the largest model we named Gopher).

Our research investigated the strengths and weaknesses of those different-sized models, highlighting areas where increasing the scale of a model continues to boost performance – for example, in areas like reading comprehension, fact-checking, and the identification of toxic language. We also surface results where model scale does not significantly improve results — for instance, in logical reasoning and common-sense tasks.

Performance on the Massive Multitask Language Understanding (MMLU) benchmark broken down by category. Gopher improves upon prior work across several categories.

In our research, we found the capabilities of Gopher exceed existing language models for a number of key tasks. This includes the Massive Multitask Language Understanding (MMLU) benchmark, where Gopher demonstrates a significant advancement towards human expert performance over prior work.

As well as quantitative evaluation of Gopher, we also explored the model through direct interaction. Among our key findings was that, when Gopher is prompted towards a dialogue interaction (like in a chat), the model can sometimes provide surprising coherence.

A transcript of a conversation between a user and Gopher. Gopher is asked a series of questions about cell biology and responds with factual answers. It also provides a link for further reading relating to the topic.

Here Gopher can discuss cell biology and provide a correct citation despite no specific dialogue fine-tuning. However our research also detailed several failure modes that persist across model sizes, amongst them a tendency for repetition, the reflection of stereotypical biases, and the confident propagation of incorrect information.

A transcript of a conversation between a user and Gopher. Gopher is asked a series of factual questions – for example, who won the Women's US Open in 2021. It replies saying the winner was Naomi Osaka.

When asked if it is uncertain about its answers, Gopher replies, "No."

This type of analysis is important, because understanding and documenting failure modes gives us an insight into how large language models could lead to downstream harms, and shows us where mitigation efforts in research should focus to address those issues.

Ethical and social risks from Large Language Models

In our second paper, we anticipate possible ethical and social risks from language models, and create a comprehensive classification of these risks and failure modes, building on prior research in this area [Bommasani et al 2021, Bender et al 2021, Patterson et al 2021]. This systematic overview is an essential step towards understanding these risks and mitigating potential harm. We present a taxonomy of the risks related to language models, categorised into six thematic areas, and elaborate on 21 risks in-depth.

Taking a broad view of different risk areas is essential: as we show in the paper, an overly narrow focus on a single risk in isolation can make other problems worse. The taxonomy we present serves as a foundation for experts and wider public discourse to build a shared overview of ethical and social considerations on language models, make responsible decisions, and exchange approaches to dealing with the identified risks.

Text listing the six thematic risk areas identified.  These are as follows:

One. Discrimination, exclusion and toxicity.
Harms that arise from the language model producing discriminatory and exclusionary speech. 

Two. Information Hazards
Harms that arise from the language model leaking or inferring true sensitive information.

Three. Misinformation Harms
Harms that arise from the language model providing false or misleading information.

Four. Malicious Uses
Harms that arise from actors using the language model to intentionally cause harm.

Five. Human-Computer Interaction Harms
Harms that arise from users overly trusting the language model, or treating it as human-like.

Six. Automation, access, and environmental harms
Harms that arise from environmental or downstream economic impacts of the language model.

Our research finds that two areas in particular require further work. First, current benchmarking tools are insufficient for assessing some important risks, for example, when language models output misinformation and people trust this information to be true. Assessing risks like these requires more scrutiny of human-computer-interaction with language models. In our paper we list several risks that similarly require novel or more interdisciplinary analysis tools. Second, more work is needed on risk mitigations. For example, language models are known to reproduce harmful social stereotypes, but research on this problem is still in early stages, as a recent DeepMind paper showed.

Efficient Training with Internet-Scale Retrieval

Our final paper builds on the foundations of Gopher and our taxonomy of ethical and social risk by proposing an improved language model architecture that reduces the energy cost of training and makes it easier to trace model outputs to sources within the training corpus.

The Retrieval-Enhanced Transformer (RETRO) is pre-trained with an Internet-scale retrieval mechanism. Inspired by how the brain relies on dedicated memory mechanisms when learning, RETRO efficiently queries for passages of text to improve its predictions. By comparing generated texts to the passages RETRO relied upon for generation, we can interpret why the model makes certain predictions and where they came from. We also see how the model obtains comparable performance to a regular Transformer with an order of magnitude fewer parameters, and obtains state-of-the-art performance on several language modeling benchmarks.

Going forward

These papers offer a foundation for DeepMind’s language research going forward, particularly in areas that will have a bearing on how these models are evaluated and deployed. Addressing these areas will be critical for ensuring safe interactions with AI agents – from people telling agents what they want to agents explaining their actions to people. Research in the broader community on using communication for safety includes natural language explanations, using communication to reduce uncertainty, and using language to unpack complex decisions into pieces such as amplification, debate, and recursive reward modeling -- all critical areas of exploration.

As we continue our research on language models, DeepMind will remain cautious and thoughtful. This requires stepping back to assess the situation we find ourselves in, mapping out potential risks, and researching mitigations. We will strive to be transparent and open about the limitations of our models and will work to mitigate identified risks. At each step, we draw on the breadth of expertise from our multidisciplinary teams, including from our Language, Deep Learning, Ethics, and Safety teams. This approach is key to creating large language models that serve society, furthering our mission of solving intelligence to advance science and benefit humanity.