A New York-based AI research company focused on biotechnology has modeled a new protein using the Protein Language Model (ProLLM), which operates on a transformer architecture similar to ChatGPT.
EvolutionReScale on June 25 introduced a first-of-its-kind AI-generated new protein molecule that glows — mimicking the bioluminescence of a jellyfish molecule called green fluorescent protein. The new protein sequence is quite different (less than 60 percent similarity) from the natural protein: a difference the company describes as being possible due to “500 million years of (natural) evolution.”
The company used its pioneering AI language model called EvolutionaryScale Model-3 (ESM3) to achieve this feat and has secured $142 million in a seed funding round, including investments from industry giants such as Nvidia and Amazon.
We are thrilled to partner with AWS and NVIDIA to push the boundaries of AI for life sciences.
— EvolutionaryScale (@EvoscaleAI) June 25, 2024
ESM3 differs from ChatGPT because it is trained on parameters (intrinsic variables) of three fundamental biological properties of proteins – sequence, structure, and function. The model was trained on 98 billion parameters, making it the largest biological AI model ever trained.
EvolutionaryScale calls this a “model trained throughout evolution.” The training set included 2.78 billion natural proteins, spanning “extreme environments ranging from the Amazon rainforest to the depths of the ocean, hydrothermal vents, and microorganisms present in a handful of soils.”
ESM3 allows users to generate proteins using signals with partial information (sequence, structure and function keywords) and repeat the model to make predictions until the entire sequence is complete. This model is primarily intended for scientists and gives them unprecedented control over the process of making proteins.
We have trained ESM3 and we are excited to introduce EvolutionaryScale.
ESM3 is a generative language model for programming biology. In experiments, we found that ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.
Read more: pic.twitter.com/AhWtC4vxlF
— Alex Rives (@alexrives) June 25, 2024
EvolutionaryScale says their aim is to make biology programmable. “ESM3 is a step toward a future where AI is a tool to engineer biology from first principles, in the same way that we engineer structures, machines, and microchips, and write computer programs,” the company’s website states.
The application of this technology could lead to breakthroughs in a number of areas such as drug discovery and development, biomedical research as well as sustainability – an example of which has already been demonstrated by EvolutionReScale, which demonstrated a protein prototype capable of decomposing plastic waste.
The possibilities are endless, as every organism has ribosomes (protein complexes responsible for protein synthesis) in every cell. However, there are also concerns that AI could be misused to create biological weapons.
Scientists have taken a proactive approach and in March set out “Community values, guiding principles, and commitments for the responsible development of AI for protein design” to guide development in this field for the good of humanity.
We are advancing a new global agreement signed by over 100 leading scientists to ensure that AI techniques for protein design are developed responsibly. This field could deliver medicines, vaccines and other innovations that benefit everyone.
— Institute for Protein Design (@UWproteindesign) March 8, 2024
EvolutionReScale has also been praised by experts for releasing a smaller open-source version for others to use freely. The complete model at scale has not been released, although its training process has been made public in an effort to remain transparent and share the technology freely.