Search Dental Tribune

ESM3: Accelerating protein evolution with AI

ESM3 is a frontier AI model that can design new functional proteins—like a novel fluorescent one—comparable to half a billion years of natural evolution. (Image: Canva)

Sat. 1 February 2025

save

Nature has spent billions of years perfecting the proteins we see today, forging them under the pressures of random mutation and natural selection. Yet a powerful new AI model named ESM3 has emerged, generating novel proteins from scratch in a fraction of the time—potentially revolutionizing everything from biotechnology to medicine. This article summarizes a recent preprint published on biorxiv.

The language of proteins
For decades, scientists have observed that protein sequences (the strings of amino acids that make up a protein) carry subtle patterns reflecting their structure and function. Recent advances in machine learning have revealed that protein sequences resemble a “language,” with grammar and rules that can be learned.

ESM3 is a multi-track transformer that jointly reasons over protein sequence, structure, and function.

  • Vast databases: Researchers have compiled massive catalogs of natural protein sequences and structures (billions of sequences, millions of structures).
  • Language models for biology: By applying techniques similar to those used for natural language processing, scientists can teach AI to “read” and “write” protein sequences.

Enter ESM3
Developed as a multimodal generative language model, ESM3 takes this idea a step further. Rather than focusing only on protein sequences, it also encodes protein structures and functions into discrete “tokens,” enabling the model to “reason” simultaneously about these different facets of protein biology.

  1. Scale and power
    • Data and parameters: ESM3 was trained on 2.78 billion proteins (totaling 771 billion tokens) and scales up to 98 billion parameters—placing it among the largest and most advanced AI models in existence.
    • Computation: Training required over 10^24 (1.07×10^24) floating point operations, pushing the boundaries of modern supercomputing.
  2. Generative abilities
    • Unlike many protein models that predict structure from known sequences, ESM3 can create entirely new proteins by recombining features it has learned.
    • The model is prompted via specialized “instructions,” guiding it to generate sequences that meet specific structural or functional criteria (e.g., fluorescence, enzyme activity).
  3. A bright achievement: esmGFP
    • Fluorescent protein design: One of ESM3’s most notable results is esmGFP, a fluorescent protein that glows green.
    • Divergence from known proteins: esmGFP shares only 58% identity with the nearest known fluorescent protein, a level of difference comparable to over 500 million years of natural evolution.
    • Functional validation: Laboratory tests confirmed that esmGFP indeed fluoresces—demonstrating ESM3 can reliably generate functional proteins that do not resemble those found in nature.

ESM3 is a generative model, and makes biology programmable. It can follow prompts to generate new proteins. Scientists can interact with ESM3, guiding it to create for a myriad of applications such as for medicine, biology research, and clean energy.

Why it matters

  • Drug discovery: By rapidly proposing unique protein structures, ESM3 could speed up the search for new therapeutics. Rather than incrementally modifying existing proteins, researchers might identify entirely novel scaffolds for enzymes, antibodies, or other drug targets.
  • Biotech revolution: From industrial enzymes that break down plastics to proteins that capture carbon more efficiently, AI-generated designs could address critical environmental challenges.
  • Materials science: Proteins are the building blocks of living systems. Designing them on demand may lead to new biomaterials with exceptional strength, flexibility, or other desired properties.
  • Evolutionary insights: As ESM3 essentially “simulates” evolutionary pathways by recombining sequence, structure, and function, it might help scientists predict how life itself evolves under different conditions.

Looking ahead
ESM3’s groundbreaking achievement—compressing half a billion years of protein evolution into just months of AI-powered design—signals a new era in protein science. While challenges remain, from verifying the stability of new proteins to ensuring their safety, the potential benefits are immense. Future iterations of models like ESM3 may venture even deeper into uncharted protein “spaces,” expanding our toolkit for tackling global challenges in health, energy, and the environment.

Conclusion
The success of ESM3 underscores a new synergy between natural evolution and computational innovation. By learning the “language” of proteins across millions of species and billions of years, ESM3 can now write its own verses—revealing the hidden possibilities of protein biology faster than ever before. This leap forward in AI-driven design not only reshapes our understanding of evolution but also sets the stage for transformative applications in science, medicine, and beyond.

Reference:
biorxiv preprint: Simulating 500 million years of evolution with a language model (31 Dec 2024)
DOI: https://doi.org/10.1101/2024.07.01.600583
Click here to read the full article

 

Topics:
Tags:
To post a reply please login or register
advertisement
advertisement