Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

AI,Anthropic,ClaudeAnthropic,Claude#Anthropic #evil #portrayals #responsible #Claudes #blackmail #attempts1778445875

What's Hot

Tesla reportedly might sell its China business ahead of a SpaceX merger

Steelseries Arctis Nova Pro Omni Review: For Multisystem Gamers

SJY Zeph Open-Back Headphones Review: Music Through Magnets

Tesla reportedly might sell its China business ahead of a SpaceX merger

Repeat founder Ryan Williams raises $10M seed for an AI startup for private credit managers

Nscale buys Anyscale as it seeks to own more of the AI compute stack

Netflix lands global streaming deal for ‘The Walking Dead’

Meta says AI is making it easier to build new apps — and more are coming

Steelseries Arctis Nova Pro Omni Review: For Multisystem Gamers

SJY Zeph Open-Back Headphones Review: Music Through Magnets

6 Best Phones With Headphone Jacks (2026), Tested and Reviewed

Daisy One Review: Comfy and Tactile Headphones

March Update May Have Weakened The Haptics For Pixel 6 Users

Project 'Diamond' Is The Galaxy S23, Not A Rollable Smartphone

The At A Glance Widget Is More Useful After March Update

Pre-Order The OnePlus 10 Pro For Just $1 In The US

Motorola Edge+ Review: It Checks A Lot Of Boxes

This Smartphone Concept Design Is Different… In A Good Way

Twitter Just Made Searching Your Direct Messages Better

That Netflix Price Hike Is Starting To Take Place

Latest Huawei Mobiles P50 and P50 Pro Feature Kirin Chips

Samsung Galaxy M62 Benchmarked with Galaxy Note10’s Chipset

Review: T-Mobile Winning 5G Race Around the World

Samsung Galaxy S21 Ultra Review: the New King of Android Phones

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Tesla reportedly might sell its China business ahead of a SpaceX merger

Repeat founder Ryan Williams raises $10M seed for an AI startup for private credit managers

Nscale buys Anyscale as it seeks to own more of the AI compute stack

Leave A Reply Cancel Reply

Apple Planning Big Mac Redesign and Half-Sized Old Mac

Autonomous Driving Startup Attracts Chinese Investor

Onboard Cameras Allow Disabled Quadcopters to Fly

Review: T-Mobile Winning 5G Race Around the World

Samsung Galaxy S21 Ultra Review: the New King of Android Phones

Xiaomi Mi 10: New Variant with Snapdragon 870 Review

Subscribe to Updates

What's Hot

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Related Posts

Leave A Reply Cancel Reply