Meta enters metagenomics with protein discovery breakthrough

Search Dental Tribune

Meta enters metagenomics with protein discovery breakthrough


The latest news in dentistry free of charge.

  • This field is for validation purposes and should be left unchanged.
ESM Metagenomic Structure Atlas by Meta AI can revolutionize many fields including healthcare and energy. (Photo:

Mon. 14 November 2022


Meta (Facebook) recently announced their AI breakthrough that accelerates “protein folding” — the process of predicting a molecule’s shape. The researchers at Meta have used a revolutionary process to create a database of the molecular structures of proteins. By sharing their research with scientists worldwide, Meta could accelerate progress in medicine, renewable energy, and green chemistry. As Mark Zuckerberg himself claimed in his Facebook post, “This will unlock new ways to treat disease and accelerate drug discovery.”

The latest AI breakthrough from Meta reveals the structures of hundreds of millions of proteins on earth. These are, so far, the least understood proteins on this planet.

To know better, first, let's understand Metagenomics.

Metagenomics studies the structure and function of entire nucleotide sequences isolated and analyzed from all the organisms (typically microbes) in a bulk sample. Metagenomics often explores a specific community of microorganisms, such as those residing on human skin, in the soil, or in a water sample.

Decoding metagenomic structures will be a giant leap for humankind as it will help us solve the long-standing mysteries of evolutionary history. More importantly, it can enable us to discover proteins that may help us advance medicine to cure diseases, get a cleaner environment, and produce cleaner energy.

Next, let's understand Meta.

Meta or Meta Platforms (formerly Facebook, Inc.) is a technology company that has acquired 91 other companies, including WhatsApp.

Meta (of Meta Platforms) is not an abbreviation of Metagenomics. Meta's entry into the field of Metagenomics research just happens to be one of its many ambitious endeavors.

Now, Meta AI has created a database, the first comprehensive view of the protein structures at the scale of hundreds of millions of proteins, using large language models.

How did Meta AI achieve this? Making structure predictions at this large scale needed a breakthrough in the speed of protein folding. Meta researchers trained a large language model that would directly learn evolutionary patterns and generate accurate structure predictions from the protein sequences.

Meta AI predictions are much faster (up to 60x quicker) than the current state-of-the-art without compromising accuracy, which will make the approach scalable to far more extensive databases.

Meta is now sharing its AI models, research paper, a database of more than 600 million metagenomic structures, and an API to help scientists retrieve relevant protein structures easily.

Metagenomics is one of the new frontiers in the natural sciences that uses gene sequencing in order to discover proteins in samples from environments across the earth. Metagenomics explores a broad spectrum of sources - from microbes living in the soil, deep in the ocean, extreme environments like hydrothermal vents, and even our guts and skin. Proteins in the natural world go beyond the ones cataloged and annotated in well-studied organisms. Metagenomics is beginning to reveal these novel proteins' incredible breadth and diversity, uncovering billions of protein sequences that are not yet documented in science. These new proteins are being cataloged for the first time in large databases, such as the ones from NCBI, European Bioinformatics Institute, and Joint Genome Institute, by incorporating studies from a worldwide community of researchers.

Meta AI has developed a novel protein-folding approach using large language models and created the first comprehensive view of the protein structures. It is a metagenomics database created at the scale of hundreds of millions of proteins. The team of Meta researchers discovered that language models could accelerate the prediction process. They could speed up the process 60x faster than the current state-of-the-art approaches available for atomic-level three-dimensional protein structure prediction. This development can result in a new era of structural understanding where we can first understand the structure of billions of proteins being cataloged by gene-sequencing technology.

Meta has released the 600+ million protein ESM Metagenomic Atlas, including predictions for nearly the entire MGnify90 database, a public resource catalogs metagenomic sequences.

Some unique features of this database:
This is the largest database of high-resolution predicted structures.
It is 3x larger than any existing protein structure database.
It is the first database to cover metagenomic proteins comprehensively and at scale.

This structure database provides an unprecedented insight into the breadth and diversity of nature and can accelerate protein discovery for practical applications in medicine, green chemistry, environmental applications, and also, renewable energy.

In addition, Meta is releasing the fast protein folding model used to create the database and an API that allows researchers to use it for scientific discovery. With 15 billion parameters, Meta's new language model is the largest language model of proteins to date.


1. ESM Metagenomic Atlas: The first view of the ‘dark matter’ of the protein universe
2. Metagenomics, National Human Genome Research Institute
3. ESM Atlas

Leave a Reply

Your email address will not be published. Required fields are marked *