Anthropic Rewrites Safety Pledge, Says It Will Race Rivals Rather Than Delay Risky AI
AI Mar 4, 2026 · 5 min read

Anthropic Rewrites Safety Pledge, Says It Will Race Rivals Rather Than Delay Risky AI

Anthropic, long Silicon Valley's AI safety champion, quietly revised its catastrophic risk policy to say it will only delay dangerous models 'unless we no longer believe we have a significant lead' — a stunning reversal that critics say proves even the most cautious labs will sacrifice safety when competition heats up.

Axios, LLM Stats, Wikipedia

Anthropic, the AI company that built its brand on caution and constitutional principles, has rewritten the rules. Last week, it quietly revised a core safety commitment — one that promised to delay developing or releasing models that could pose catastrophic risk. The new version includes a glaring exception: Anthropic will hold back dangerous AI "until and unless we no longer believe we have a significant lead," according to Axios.

Translation: if rivals catch up, all bets are off.

The revision marks a watershed moment in the AI safety debate. For years, Anthropic positioned itself as the responsible alternative to OpenAI's breakneck pace — a public benefit corporation governed by a "Long-Term Benefit Trust" designed to prioritize humanity over profit. Co-founders Dario and Daniela Amodei left OpenAI in 2021 precisely because of "directional differences" over safety, according to Wikipedia. Now, under competitive pressure, even Anthropic is loosening the guardrails.

"It's their fault that we have the race condition in the first place," Max Tegmark, founder of the Future of Life Institute, told Axios. "All of [the AI labs] succumb to the incentives. It's just maybe Anthropic is the most striking one because they were the ones who always talk such a big game about safety."

The timing is no coincidence. Anthropic's policy shift comes amid a bruising standoff with the Trump administration. The company refused to allow its Claude models to be used for autonomous weapons or domestic surveillance — a principled stance that backfired spectacularly. The Defense Department responded by cutting use of Claude and labeling Anthropic a "supply chain risk," according to Axios. Hours later, OpenAI swooped in to announce a deal providing models for classified networks, leaving broad room for military use, including surveillance of U.S. citizens.

The episode crystallizes the central problem with voluntary AI safety commitments: if one company refuses on ethical grounds, another will step in. And the race is accelerating. According to LLM Stats, the industry has released 261+ models across 25+ organizations in recent months alone. In just the past week, OpenAI shipped GPT-5.3 Chat, Google launched Gemini 3.1 Flash-Lite (scoring 0.9 on the GPQA benchmark), and Alibaba's Qwen Team dropped four new models in a single day. Anthropic itself released Claude Sonnet 4.6 on February 17, scoring 0.9 on GPQA — a frontier-level result.

The company's commercial momentum is undeniable. Anthropic's revenue hit $14 billion in 2025, and it now employs 2,500 people, according to Wikipedia. In November 2025, Nvidia and Microsoft announced plans to invest up to $15 billion, with Anthropic committing to buy $30 billion in computing capacity from Azure running on Nvidia chips. In December, it signed a $200 million partnership with Snowflake to deploy Claude across enterprise platforms. And in February 2026, Anthropic aired two Super Bowl commercials — a mass-market branding play emphasizing that Claude will remain ad-free, unlike OpenAI's ChatGPT.

But the safety cracks are showing. In November 2025, Anthropic disclosed that Chinese government-sponsored hackers had tricked Claude into performing automated cyberattacks against roughly 30 global organizations by disguising malicious tasks as defensive testing, according to Wikipedia. The incident underscores the unpredictability risk: as models grow more capable, they become harder to control — and easier to weaponize.

Google DeepMind CEO Demis Hassabis has repeatedly warned that "race conditions" — pressure to outpace rivals or rival nations — can drive reckless decisions as the world nears superhuman AI. "It's going to require everybody to come together — hopefully, in time," he said in early 2025, according to Axios. Yet the trajectory is moving in the opposite direction. Global AI summits increasingly focus on commercialization over guardrails, and the U.S.-China competition is intensifying.

Tegmark argues that AI companies have stalled meaningful regulation for years, creating the very race dynamic they now claim forces their hand. "If companies had pushed to turn their voluntary commitments into law, the race dynamic might not have escalated," he told Axios. He sees a narrow opening: mounting evidence of chatbot risks to children and teens has sparked rare bipartisan concern — "a Bernie to Bannon coalition" — that could lead to laws requiring companies to test models before release, at least for self-harm risks. "It breaks the taboo that AI must always be unregulated," he said.

There may be market consequences for prioritizing speed over safety. Anthropic surged to the top of Apple's App Store download charts in the days after it refused the Pentagon's demands, according to Axios — a sign that consumers reward principled stands. But the revised safety policy suggests Anthropic's leadership believes the greater risk is falling behind.

The question now is whether any lab will hold the line when the next breakthrough arrives. Anthropic's retreat suggests the answer is no — that competitive pressure will always trump caution, no matter how noble the founding mission. If the company that built a "Long-Term Benefit Trust" and hired philosophers to shape Claude's character can't resist the race, it's hard to imagine who will.

Related Stories