ESM3 is a frontier AI model that can design new functional proteins—like a novel fluorescent one—comparable to half a billion years of natural evolution. (Image: Canva)
Nature has spent billions of years perfecting the proteins we see today, forging them under the pressures of random mutation and natural selection. Yet a powerful new AI model named ESM3 has emerged, generating novel proteins from scratch in a fraction of the time—potentially revolutionizing everything from biotechnology to medicine. This article summarizes a recent preprint published on biorxiv.
The language of proteins
For decades, scientists have observed that protein sequences (the strings of amino acids that make up a protein) carry subtle patterns reflecting their structure and function. Recent advances in machine learning have revealed that protein sequences resemble a “language,” with grammar and rules that can be learned.
ESM3 is a multi-track transformer that jointly reasons over protein sequence, structure, and function.
Vast databases: Researchers have compiled massive catalogs of natural protein sequences and structures (billions of sequences, millions of structures).
Language models for biology: By applying techniques similar to those used for natural language processing, scientists can teach AI to “read” and “write” protein sequences.
Enter ESM3
Developed as a multimodal generative language model, ESM3 takes this idea a step further. Rather than focusing only on protein sequences, it also encodes protein structures and functions into discrete “tokens,” enabling the model to “reason” simultaneously about these different facets of protein biology.
Scale and power
Data and parameters: ESM3 was trained on 2.78 billion proteins (totaling 771 billion tokens) and scales up to 98 billion parameters—placing it among the largest and most advanced AI models in existence.
Computation: Training required over 10^24 (1.07×10^24) floating point operations, pushing the boundaries of modern supercomputing.
Generative abilities
Unlike many protein models that predict structure from known sequences, ESM3 can create entirely new proteins by recombining features it has learned.
The model is prompted via specialized “instructions,” guiding it to generate sequences that meet specific structural or functional criteria (e.g., fluorescence, enzyme activity).
A bright achievement: esmGFP
Fluorescent protein design: One of ESM3’s most notable results is esmGFP, a fluorescent protein that glows green.
Divergence from known proteins: esmGFP shares only 58% identity with the nearest known fluorescent protein, a level of difference comparable to over 500 million years of natural evolution.
Functional validation: Laboratory tests confirmed that esmGFP indeed fluoresces—demonstrating ESM3 can reliably generate functional proteins that do not resemble those found in nature.
ESM3 is a generative model, and makes biology programmable. It can follow prompts to generate new proteins. Scientists can interact with ESM3, guiding it to create for a myriad of applications such as for medicine, biology research, and clean energy.
Why it matters
Drug discovery: By rapidly proposing unique protein structures, ESM3 could speed up the search for new therapeutics. Rather than incrementally modifying existing proteins, researchers might identify entirely novel scaffolds for enzymes, antibodies, or other drug targets.
Biotech revolution: From industrial enzymes that break down plastics to proteins that capture carbon more efficiently, AI-generated designs could address critical environmental challenges.
Materials science: Proteins are the building blocks of living systems. Designing them on demand may lead to new biomaterials with exceptional strength, flexibility, or other desired properties.
Evolutionary insights: As ESM3 essentially “simulates” evolutionary pathways by recombining sequence, structure, and function, it might help scientists predict how life itself evolves under different conditions.
Looking ahead
ESM3’s groundbreaking achievement—compressing half a billion years of protein evolution into just months of AI-powered design—signals a new era in protein science. While challenges remain, from verifying the stability of new proteins to ensuring their safety, the potential benefits are immense. Future iterations of models like ESM3 may venture even deeper into uncharted protein “spaces,” expanding our toolkit for tackling global challenges in health, energy, and the environment.
Conclusion
The success of ESM3 underscores a new synergy between natural evolution and computational innovation. By learning the “language” of proteins across millions of species and billions of years, ESM3 can now write its own verses—revealing the hidden possibilities of protein biology faster than ever before. This leap forward in AI-driven design not only reshapes our understanding of evolution but also sets the stage for transformative applications in science, medicine, and beyond.
Reference: biorxiv preprint: Simulating 500 million years of evolution with a language model (31 Dec 2024)
DOI: https://doi.org/10.1101/2024.07.01.600583 Click here to read the full article
The crisis of antimicrobial resistance (AMR) stems largely from the widespread misuse and overprescription of antibiotics across various sectors, including ...
In a groundbreaking announcement, the Royal Swedish Academy of Sciences has awarded the 2024 Nobel Prize in Chemistry to three pioneers in protein science. ...
The healthcare industry is experiencing a transformative shift from a rigid, one-size-fits-all system to a personalized healthcare system guided by ...
Dr. Sachin Gavali, BDS who is now pursuing his Ph.D. in Bioinformatics Data Science at the University of Delaware (USA) writes about Deep Learning and its ...
While several European and western countries are still struggling with Covid-19 cases, enforcing restrictions on social gatherings, life seems to have ...
Education
Live webinar Wed. 19 February 2025 11:30 pm IST (New Delhi)
The second full budget of the Modi 3.0 came in a little more than 6 months; the first being on 23rd July 2024. The huge expectations from the middle class ...
This month, Dental Tribune South Asia features an exclusive interview with Dr. Anil Kishen, a globally renowned endodontist and professor with expertise in ...
Malabar Investment commits INR 65 crore, with Whiteoak injecting INR 20 crore, enabling Dentalkart to scale its operations, broaden its market reach, and ...
To post a reply please login or register