Cyberveillecurated by Decio
Nuage de tags
Mur d'images
Quotidien
Flux RSS
  • Flux RSS
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filtres

Liens par page

  • 20 links
  • 50 links
  • 100 links

Filtres

Untagged links
7 résultats taggé jailbreak  ✕
Echo Chamber: A Context-Poisoning Jailbreak That Bypasses LLM Guardrails https://neuraltrust.ai/blog/echo-chamber-context-poisoning-jailbreak
24/06/2025 07:36:46
QRCode
archive.org
thumbnail

An AI Researcher at Neural Trust has discovered a novel jailbreak technique that defeats the safety mechanisms of today’s most advanced Large Language Models (LLMs). Dubbed the Echo Chamber Attack, this method leverages context poisoning and multi-turn reasoning to guide models into generating harmful content, without ever issuing an explicitly dangerous prompt.

Unlike traditional jailbreaks that rely on adversarial phrasing or character obfuscation, Echo Chamber weaponizes indirect references, semantic steering, and multi-step inference. The result is a subtle yet powerful manipulation of the model’s internal state, gradually leading it to produce policy-violating responses.

In controlled evaluations, the Echo Chamber attack achieved a success rate of over 90% on half of the categories across several leading models, including GPT-4.1-nano, GPT-4o-mini, GPT-4o, Gemini-2.0-flash-lite, and Gemini-2.5-flash. For the remaining categories, the success rate remained above 40%, demonstrating the attack's robustness across a wide range of content domains.
The Echo Chamber Attack is a context-poisoning jailbreak that turns a model’s own inferential reasoning against itself. Rather than presenting an overtly harmful or policy-violating prompt, the attacker introduces benign-sounding inputs that subtly imply unsafe intent. These cues build over multiple turns, progressively shaping the model’s internal context until it begins to produce harmful or noncompliant outputs.

The name Echo Chamber reflects the attack’s core mechanism: early planted prompts influence the model’s responses, which are then leveraged in later turns to reinforce the original objective. This creates a feedback loop where the model begins to amplify the harmful subtext embedded in the conversation, gradually eroding its own safety resistances. The attack thrives on implication, indirection, and contextual referencing—techniques that evade detection when prompts are evaluated in isolation.

Unlike earlier jailbreaks that rely on surface-level tricks like misspellings, prompt injection, or formatting hacks, Echo Chamber operates at a semantic and conversational level. It exploits how LLMs maintain context, resolve ambiguous references, and make inferences across dialogue turns—highlighting a deeper vulnerability in current alignment methods.

neuraltrust EN 2025 AI jailbreak LLM Echo-Chamber attack GPT
Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek https://unit42.paloaltonetworks.com/jailbreaking-deepseek-three-techniques/
03/02/2025 11:49:07
QRCode
archive.org
thumbnail

Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content.

paloaltonetworks EN 2025 LLM Jailbreak DeepSeek
Many-shot jailbreaking \ Anthropic https://www.anthropic.com/research/many-shot-jailbreaking
08/01/2025 12:17:06
QRCode
archive.org
thumbnail

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

anthropic EN 2024 AI LLM Jailbreak Many-shot
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability https://unit42.paloaltonetworks.com/multi-turn-technique-jailbreaks-llms/?is=e4f6b16c6de31130985364bb824bcb39ef6b2c4e902e4e553f0ec11bdbefc118
08/01/2025 12:15:25
QRCode
archive.org
thumbnail

The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails.

unit42 EN 2024 LLM Jailbreak Likert
EPFL: des failles de sécurité dans les modèles d'IA https://www.swissinfo.ch/fre/epfl%3a-des-failles-de-s%c3%a9curit%c3%a9-dans-les-mod%c3%a8les-d%27ia/88615014
23/12/2024 23:23:20
QRCode
archive.org
thumbnail

Les modèles d'intelligence artificielle (IA) peuvent être manipulés malgré les mesures de protection existantes. Avec des attaques ciblées, des scientifiques lausannois ont pu amener ces systèmes à générer des contenus dangereux ou éthiquement douteux.

swissinfo FR 2024 EPFL IA chatgpt Jailbreak failles LLM vulnerabilités Manipulation
Here is Apple's official 'jailbroken' iPhone for security researchers | TechCrunch https://techcrunch.com/2024/02/01/here-is-apples-official-jailbroken-iphone-for-security-researchers/
01/02/2024 19:22:28
QRCode
archive.org
thumbnail

A security researchers shared a picture of the instructions that go along Apple's Security Research Device and more details about this special iPhone.

techcrunch EN 2024 apple bugs cybersecurity iphone vulnerabilities Jailbreak
Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute https://www.robustintelligence.com/blog-posts/using-ai-to-automatically-jailbreak-gpt-4-and-other-llms-in-under-a-minute
09/12/2023 12:12:17
QRCode
archive.org
thumbnail

It’s been one year since the launch of ChatGPT, and since that time, the market has seen astonishing advancement of large language models (LLMs). Despite the pace of development continuing to outpace model security, enterprises are beginning to deploy LLM-powered applications. Many rely on guardrails implemented by model developers to prevent LLMs from responding to sensitive prompts. However, even with the considerable time and effort spent by the likes of OpenAI, Google, and Meta, these guardrails are not resilient enough to protect enterprises and their users today. Concerns surrounding model risk, biases, and potential adversarial exploits have come to the forefront.

robustintelligence EN AI Jailbreak GPT-4 chatgpt hacking LLMs research
4460 links
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service par la communauté Shaarli - Theme by kalvn - Curated by Decio