Enabling high-accuracy protein structure prediction at the proteome scale

Since the determination of the first protein structure in 1958, knowledge of protein structures has proven invaluable for understanding their properties, mechanisms, and functions. In parallel to the growth of experimental structural biology, computational methods have been developed for predicting a protein’s structure from its amino acid sequence. This complementary approach is faster and easier, promising to scale efficiently and keep up with the exponential growth in sequence data.

In the 2020 CASP14 experiment, the latest version of our machine-learning based method AlphaFold achieved unprecedented structure prediction accuracy. Now, in two recently published companion papers, we fully describe the AlphaFold method, and demonstrate its application at scale to the human proteome. In parallel we have taken steps to ensure broad access to AlphaFold and its predictions. First, the source code has been made available along with a trained model, allowing new predictions to be generated. Second, we have partnered with EMBL-EBI to develop the AlphaFold DB. The initial database release makes freely available structure predictions for the human proteome plus 20 key organisms, with plans to expand in the near future to cover the ~100 million proteins in UniRef90. Here we present an overview of this work and of our efforts to make AlphaFold predictions accessible and useful to the scientific community.

  • Read our October 2021 paper where we apply an AlphaFold model specifically trained for multimeric inputs here.