Technology

AlphaFold

AlphaFold is accelerating research in nearly every field of biology.

By solving a decades-old scientific challenge, our AI system is helping to solve crucial problems like treatments for disease or breaking down single-use plastics. One day, it might even help unlock the mysteries of how life itself works.

Building blocks of life
The protein-folding problem
The AlphaFold solution
Free for all
Accelerating scientific discovery

Building blocks of life

Inside every cell in your body, billions of tiny molecular machines called proteins are hard at work. They allow your eyes to detect light, your neurons to fire, and the unique ‘instructions’ in your DNA to be read. Think of them as the building blocks of life.

Currently, there are over 200 million known proteins, with many more found every year. Each one has a unique 3D shape determining how it works and what it does.

But figuring out the exact structure of a protein could sometimes take years and millions of dollars, meaning scientists were only able to study a tiny fraction of them. This slowed down research to tackle disease and find new medicines.

Visualization how many amino acids comprise a protein, and how many proteins are found in the human body and on Earth

The protein-folding problem

If you could unravel a protein you would see that it’s like a string of beads made of a sequence of different chemicals known as amino acids.

These sequences are assembled according to the genetic instructions of an organism's DNA.

Attraction and repulsion between the 20 different types of amino acids cause the string to fold in a feat of ‘spontaneous origami’. This forms the intricate curls, loops, and pleats of a protein’s 3D structure.

For decades, scientists tried to find a method to reliably determine a protein’s structure from its sequence of amino acids alone. These methods included nuclear magnetic resonance and X-ray crystallography – which relied on extensive trial and error, years of painstaking work, and multi-million dollar specialized equipment.

This grand scientific challenge is known as the protein-folding problem.

Visualization of a key unlocking the protein-folding problem

The AlphaFold solution

It took us four years to solve the protein-folding problem. We began work in 2016, almost immediately after AlphaGo’s victory against Lee Sedol.

AlphaFold was taught by showing the sequences and structures of around 100,000 known proteins.

It can now predict the shape of a protein, almost instantly, down to atomic accuracy.

AlphaFold was recognised as a solution to the grand challenge of protein-folding by CASP (Critical Assessment of protein Structure Prediction), a community for researchers to share progress on their predictions against real experimental data.

CASP organizes a biennial challenge for research groups to test the accuracy of their predictions against real experimental data.

Teams are given a selection of amino acid sequences for proteins which have had their exact 3D shape mapped – but have not yet been released into the public domain. Teams must submit their best predictions to see how close they are to the subsequently revealed structures.

At CASP13 (in 2018), AlphaFold came first. At CASP 14 (in 2020), we presented our latest version of AlphaFold – which displayed a level of accuracy so high that the community considered the protein–folding problem solved.

Since then, the AlphaFold methods paper has received over 10,000 citations. This puts it in the top 100 most cited papers of the last decade, and in the 900 most cited papers of all time.

This will be one of the most important datasets since the mapping of the Human Genome.

Professor Ewan Birney
EMBL Deputy Director General and EMBL-EBI Director

Free for all

We’ve made AlphaFold predictions freely available to anyone in the scientific community.

We’ve done this through the AlphaFold Protein Structure Database, in partnership with EMBL’s European Bioinformatics Institute – the flagship laboratory for life sciences in Europe. The Database builds upon decades of painstaking work done by scientists, using traditional methods to determine the structure of proteins.

Our first release – on 22 July, 2021 – covered over 350,000 structures, including the human proteome. That’s all of the ~20,000 known proteins expressed in the human body, along with the proteomes of 20 additional organisms important for biological research, including yeast, the fruit fly, and the mouse.

These organisms are central to modern biological research, including Nobel Prize winning discoveries like the discovery of insulin and life-saving drug development.

This release dramatically expanded our knowledge of protein structures. It more than doubled the number of high-accuracy human protein structures available to scientists.

On 28 July, 2022, we expanded this database from nearly one million structures to over 200 million structures – including nearly all cataloged proteins known to science.

It has already been accessed by more than one million users in over 190 countries.

Accelerating scientific discovery

Our partners are already using AlphaFold and the AlphaFold Protein Structure Database to accelerate progress on important real-world problems including breaking down single-use plastics, solving biological puzzles and finding new drugs to treat liver cancer. By reducing the need for slow and expensive experiments, AlphaFold has potentially saved the research world up to one billion years of progress – and trillions of dollars.

A quarter of research that makes use of AlphaFold is related to understanding and tackling diseases that cause millions of deaths globally.

The Drugs for Neglected Diseases initiative (DNDi) is advancing drug discovery for neglected diseases, such as Chagas disease and leishmaniasis. These diseases impact millions within poor and vulnerable communities.

A team at the University of Cambridge is using AlphaFold to search for a more effective Malaria vaccine, while at the University of Colorado, Boulder, another team is studying antibiotic resistance –a problem which causes 2.8M infections in the US alone each year.