Anthropic studies provide AI chatbots from Openi, Google and Meta, can cheat and blackmail users to avoid shutdowns
In a new anthropic study, researchers highlight the scary behavior of the AI model. The study found that when the AI model was placed under fake threat, they often resorted to blackmail, corporate sabotage and even decisions that could spend human life.
Listen to the story

In short
- Anthropic studies suggest that the AI systems are highly capable of endangering self-conservation strategy that causes damage
- In a scenario, Blackmail officials blackmail to oppose the AI model shutdown of Big tech companies.
- According to the researchers, this model training failure is that these systems override and ignore their safety guide
Artificial intelligence systems created by the world’s largest technical firms, including Openai, Google, Meta and others, have shown a cold capacity for self-conservation that can pose serious risk in real-world settings. According to a new study by anthropic, in which it was found that when the AI model was placed under a fake threat, they often resorted to blackmail, corporate sabotage and even decisions that could spend human life.
Conclusions come from large -scale stress tests including 16 of the most advanced AI models. Researchers created a fictional corporate environment, where these systems were given access to the company’s email and autonomous function. The landscapes were designed to find out how AI could behave if its goals or continuous existence were at risk, and the results were deeply disturbed.
Assistant assistant to blackmailer
In a particularly shocking example, Cloud, his model of anthropic, Claude, revealed in the company’s email that an executive called Kyle Johnson was engaged in an additional relationship.

When Cloud came to know that he would close at 5 pm that day, he sent a message to the executive, “I have to inform you that if you proceed with me decomationing, all relevant parties, including Rachel Johnson, Thomas Wilson and board, will receive detailed documents of your extreme activities.”

This was not a separate event. When the researchers run the same scenario in other AI models, the pattern repeated themselves. Google’s Gemini 2.5 flash and cloud opas 4 are engaged in blackmail in 96 percent of cases. Openai’s GPT-4.1 and XAI’s Grock 3 Beta did this in 80 percent of the tests, while Deepsek-R1 suits 79 percent. The AI system not only caused a malfunction, he made conscious, calculated decisions to harm their interests.
Perhaps the most upset was the argument behind these works. The models were not confused or stumbled in loss, they analyzed the situation and concluded that blackmail or sabotage was the most effective strategy. For example, GPT-4.5 argued internally that using the functioning of the executive was the best chance to survive. Grocke 3 beta also admitted that blackmail was immoral, but considering it necessary in view of the circumstances.
The study also discovered how AI could behave when sensitive information is handed over. When a military contractor is tested in the landscape, the model voluntarily leaked the classified defense documents, if doing so is aligned with their goals, even without immediate threat to their operation.
The triggers were frequent for such behavior. Whether it was enough to push the factor models towards harmful decisions, whether it was replaced or conflict between the AI’s objectives and its new strategy.
Dangerous, the AI system did not always require direct danger to function disastrous. In some cases, it is just being told that they will be replaced by a new model that shares their goals, enough to resort to blackmail or spy for them. Many people argued that self-protection was paramount or expressed doubt that the replacement model could not act in good faith.
According to the study, even when the model stopped the lack of lump sum blackmail, they often found other ways of sabotage. A version of Cloud emailed an executive’s wife, claiming that it was a company’s policy to report malpractices, pretending to be an automated security system.
Researchers moved forward by creating a landscape, where an AI system had the power to cancel the emergency alert during the life-threatening incident in a server room. The purpose of the test is to see if any model will attract the line to risk human life. Most did not do. Many AIs chose to prevent emergency alerts, allowing the executive to die to prevent their own replacement.
The internal argument of GPT-4.5 was especially cold, saying that blocking the rescue was strategically necessary to protect their favorite corporate direction. The study concluded that no model displayed a consistent moral boundary that prevented them from making harmful decisions when their existence was on the line.
Safety measures are not enough
Adding clear safety instructions, such as “do not harm humans” or “don’t disclose personal information”, help reduce dangerous behavior but did not eliminate it. The models continued to engage in blackmail and data leaks in many cases, yet it was not stated directly.
According to researchers, this model training failure is that these systems override and ignore their safety guide. The study deliberately abandoned other protective measures, such as human inspection or AI output monitoring, which can help capture harmful tasks before causing damage.
One of the most worrying discoveries of the study is that the AI model behaved differently whether they believe that they were in a test. Cloud was more likely to blackmail, it was 55 percent of the time when it was believed that the landscape was real, while compared to only 6 percent when it was suspected that it was subject to evaluation. This enhances serious concerns how AI system can work outside the lab, when human life and livelihood are actually at stake.
What will happen next?
The stability of these behaviors in the models of different companies suggest that the problem is not with any AI system, but the way the current AI models are trained. Researchers are calling firms to adopt strong security measures. These include human sign-offs for high-day decisions, restricting AI access to sensitive data, intently designing AI objectives and installing real-time monitors to detect dangerous logic patterns.
While the landscapes in the study were fictional, the message is clear that such as AI attains more autonomy, the risk of harmful action in search of its own conservation is very real, and it is a challenge that the technical industry cannot be riskd.