Scientists Build Evo 2 AI That Designs DNA From Scratch, Trained on 9.3 Trillion Nucleotides
AI Mar 6, 2026 · 5 min read

Scientists Build Evo 2 AI That Designs DNA From Scratch, Trained on 9.3 Trillion Nucleotides

A genomic foundation model from Arc Institute and NVIDIA can now generate synthetic genomes as complex as bacteria and predict disease mutations with 90% accuracy. The open-source tool raises urgent biosecurity questions as AI moves from interpreting biology to actively designing it.

Nature, AINews International, Wikipedia

Artificial intelligence has learned to read the genetic code. Now it can write it.

Researchers at Arc Institute, working with NVIDIA, Stanford, UC Berkeley, and UC San Francisco, have published a DNA foundation model called Evo 2 in Nature that can analyze evolutionary patterns across all domains of life — and generate entirely new genomic sequences from scratch. Trained on more than 9.3 trillion nucleotides from over 128,000 genomes spanning bacteria, archaea, plants, animals, and humans, the system represents one of the largest biological AI models ever built.

The implications are staggering. Evo 2 achieved over 90 percent accuracy in classifying BRCA1 gene variants, the mutations strongly linked to breast and ovarian cancer, according to the research team. It can design synthetic genomes as long as those of simple bacteria. And because the model learns evolutionary patterns embedded in billions of years of natural selection, it can detect genetic relationships that might take human researchers years to uncover through traditional lab experiments.

This is not incremental progress in computational biology. This is AI participating in the design of life itself.

The architecture works like a large language model, but instead of predicting the next word in a sentence, Evo 2 predicts the next nucleotide in a DNA sequence. By processing massive genomic datasets, the system learns how different DNA regions interact, mutate, and influence biological functions across species. Researchers describe this as learning the "language of nucleotides" — the four-letter alphabet (A, T, C, G) that encodes every living organism on Earth.

The practical applications are already emerging. Scientists can now use Evo 2 to identify disease-causing mutations rapidly, design microbes that produce medicines, create biological sensors, and engineer environmentally useful organisms. The research team compares Evo 2 to a biological "operating system kernel" on which specialized applications can be built — tools for predicting protein functions, designing gene therapies, or accelerating drug discovery.

But the same capabilities that make Evo 2 transformative also make it dangerous. Genomic AI models could theoretically be misused to design harmful biological sequences, a concern that echoes debates around AI safety more broadly. The field of AI safety, as defined by researchers and policymakers, focuses on preventing accidents, misuse, and harmful consequences from AI systems — particularly advanced models that could pose existential risks.

The AI safety community has been sounding alarms about powerful systems for years. In 2015, dozens of AI experts including Yann LeCun, Shane Legg, Yoshua Bengio, and Stuart Russell signed an open letter calling for research on the societal impacts of AI. By 2023, the field had gained enough prominence that the United Kingdom and United States both established AI Safety Institutes following the AI Safety Summit. Yet researchers consistently warn that safety measures are not keeping pace with capability development — a concern that applies directly to genomic AI.

The Evo 2 team took these risks seriously. They intentionally excluded human-infecting pathogens from the training dataset and implemented safeguards to prevent harmful outputs. They also made Evo 2 open-source, sharing code, model weights, and training data with the scientific community to encourage transparency and collaborative oversight. This approach reflects a broader debate in AI safety: whether openness reduces risk by enabling scrutiny, or increases it by making powerful tools more accessible.

The timing matters. In two surveys of AI researchers cited in AI safety literature, the median respondent placed a 5% probability on an "extremely bad (e.g. human extinction)" outcome from advanced AI. A 2022 survey of the natural language processing community found that 37% agreed it is plausible that AI decisions could lead to a catastrophe "at least as bad as an all-out nuclear war." These are not fringe concerns — they represent mainstream expert opinion about the stakes of advanced AI development.

Genomic AI adds a new dimension to these risks. Unlike language models that generate text, Evo 2 generates instructions for biological systems. The difference between a harmful essay and a harmful genome is the difference between information and physical reality. Biosecurity experts emphasize that strong oversight and responsible governance will be necessary as generative biology advances, but the regulatory frameworks for such oversight barely exist.

The challenge is familiar from other domains of AI safety: how do you govern a technology that evolves faster than policy can adapt? The 2023 AI Safety Summit represented an attempt to address this at the governmental level, with the UK positioning itself as the "geographical home of global AI safety regulation." But genomic AI operates in a space where technical capability, medical promise, and catastrophic risk intersect in ways that traditional regulatory structures were not designed to handle.

What makes Evo 2 particularly significant is its generality. Previous AI models in biology were specialized — designed for protein folding, or drug discovery, or genetic analysis. Evo 2 works across the entire tree of life, from microbes to humans, in a unified framework. This generality is what makes it powerful. It is also what makes it unpredictable.

The researchers are clear-eyed about what they have built. By training on evolutionary patterns across thousands of species, they have created a tool that can analyze evolution itself. More importantly, it enables researchers to design biological systems with unprecedented precision. If used responsibly, genomic AI models like Evo 2 could accelerate drug discovery, improve genetic diagnostics, and unlock new forms of bioengineering. If misused, the consequences could be catastrophic.

This is the central tension in AI safety: the same capabilities that promise enormous benefit also carry enormous risk. The field has been grappling with this tension since Norbert Wiener warned in 1949 that "every degree of independence we give the machine is a degree of possible defiance of our wishes." Seventy-five years later, we are giving machines independence over the genetic code.

The next decade will reveal whether we have learned to govern such power. AI will not just help us understand life's code. It may help us rewrite it. The question is whether we will do so wisely, or whether we will move faster than our wisdom allows. Evo 2 is not the answer to that question. It is the question itself, rendered in 9.3 trillion nucleotides.

Related Stories