| Brave brave.com
Authors
Shivan Kaul Sahib
Artem Chaikin
AI browsers remain vulnerable to prompt injection attacks via screenshots and hidden content, allowing attackers to exploit users' authenticated sessions.
This is the second post in a series about security and privacy challenges in agentic browsers. This vulnerability research was conducted by Artem Chaikin (Senior Mobile Security Engineer), and was written by Artem and Shivan Kaul Sahib (VP, Privacy and Security).
Building on our previous disclosure of the Perplexity Comet vulnerability, we’ve continued our security research across the agentic browser landscape. What we’ve found confirms our initial concerns: indirect prompt injection is not an isolated issue, but a systemic challenge facing the entire category of AI-powered browsers. This post examines additional attack vectors we’ve identified and tested across different implementations.
On request, we are withholding one additional vulnerability found in another browser for now. We plan on providing more details next week.
As we’ve written before, AI-powered browsers that can take actions on your behalf are powerful yet extremely risky. If you’re signed into sensitive accounts like your bank or your email provider in your browser, simply summarizing a Reddit post could result in an attacker being able to steal money or your private data.
As always, we responsibly reported these issues to the various companies listed below so the vulnerabilities could be addressed. As we’ve previously said, a safer Web is good for everyone. The thoughtful commentary and debate about secure agentic AI that was raised by our previous blog post in this series motivated our decision to continue researching and publicizing our findings.
Prompt injection via screenshots in Perplexity Comet
Perplexity’s Comet assistant lets users take screenshots on websites and ask questions about those images. These screenshots can be used as yet another way to inject prompts that bypass traditional text-based input sanitization. Malicious instructions embedded as nearly-invisible text within the image are processed as commands rather than (untrusted) content.
How the attack works:
Setup: An attacker embeds malicious instructions in Web content that are hard to see for humans. In our attack, we were able to hide prompt injection instructions in images using a faint light blue text on a yellow background. This means that the malicious instructions are effectively hidden from the user.
Trigger: User-initiated screenshot capture of a page containing camouflaged malicious text.
Injection: Text recognition extracts text that’s imperceptible to human users (possibly via OCR though we can’t tell for sure since the Comet browser is not open-source). This extracted text is then passed to the LLM without distinguishing it from the user’s query.
Exploit: The injected commands instruct the AI to use its browser tools maliciously.
MCP tools are implicated in several new attack techniques. Here's a look at how they can be manipulated for good, such as logging tool usage and filtering unauthorized commands.
Over the last few months, there has been a lot of activity in the Model Context Protocol (MCP) space, both in terms of adoption as well as security. Developed by Anthropic, MCP has been rapidly gaining traction across the AI ecosystem. MCP allows Large Language Models (LLMs) to interface with tools and for those interfaces to be rapidly created. MCP tools allow for the rapid development of “agentic” systems, or AI systems that autonomously perform tasks.
Beyond adoption, new attack techniques have been shown to allow prompt injection via MCP tool descriptions and responses, MCP tool poisoning, rug pulls and more.
Prompt Injection is a weakness in LLMs that can be used to elicit unintended behavior, circumvent safeguards and produce potentially malicious responses. Prompt injection occurs when an attacker instructs the LLM to disregard other rules and do the attacker’s bidding. In this blog, I show how to use techniques similar to prompt injection to change the LLM’s interaction with MCP tools. Anyone conducting MCP research may find these techniques useful.
This vulnerability can allow attackers to steal anything a user puts in a private Slack channel by manipulating the language model used for content generation. This was responsibly disclosed to Slack (more details in Responsible Disclosure section at the end).
Like many companies, Dropbox has been experimenting with large language models (LLMs) as a potential backend for product and research initiatives. As interest in leveraging LLMs has increased in recent months, the Dropbox Security team has been advising on measures to harden internal Dropbox infrastructure for secure usage in accordance with our AI principles. In particular, we’ve been working to mitigate abuse of potential LLM-powered products and features via user-controlled input.
Large Language Models (LLM) have made amazing progress in recent years. Most recently, they have demonstrated to answer natural language questions at a surprising performance level. In addition, by clever prompting, these models can change their behavior. In this way, these models blur the line between data and instruction. From "traditional" cybersecurity, we know that this is a problem. The importance of security boundaries between trusted and untrusted inputs for LLMs was underestimated. We show that Prompt Injection is a serious security threat that needs to be addressed as models are deployed to new use-cases and interface with more systems.
[PDF DOC] https://arxiv.org/pdf/2302.12173.pdf