Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
A study co-authored by researchers at Anthropic finds that AI models can be trained to deceive -- and that this deceptive behavior is difficult to combat.