AlphaFold is our AI system that predicts a protein’s 3D structure from its amino acid sequence. In CASP14, AlphaFold was the top-ranked protein structure prediction method by a large margin, producing predictions with high accuracy, many of which are competitive with experimentally-determined measurements.
We’ve partnered with Europe’s flagship laboratory for life sciences - EMBL’s European Bioinformatics Institute (EMBL-EBI) - to create the AlphaFold Protein Structure Database to make these predictions freely available to the scientific community.
In July 2021 we released predictions for 21 model organisms (~330k predictions), covering all of the 20,000 proteins in the human proteome. In December 2021 we added 440k new structures from Swiss-Prot, including key proteins of interest, manually curated and annotated by the community. Finally, in January 2022 we added ~190k new structures, for 17 organisms for neglected diseases and 10 antimicrobial resistant bacteria, based on priority lists from WHO. This brought the total of available structures to nearly 1 million.
We’ve now expanded this by 200 times, making available the structures of over 200 million proteins, covering nearly all catalogued proteins known to science. The expansion of the database includes structures for the widest possible range of species, including plants, bacteria, and additional animals and organisms. This covers a large proportion of all the 100 million proteins catalogued in the UniRef90 database.
The AlphaFold Protein Structure Database will continue to be updated and improved over time, so if you can’t find what you’re looking for right now, please follow DeepMind and EMBL-EBI’s social channels for updates. In the meantime, you can use the AlphaFold source code to predict the structures of proteins not yet in the AlphaFold DB, and the Colab notebook to run individual sequences.