engineering.cmu.edu - College of Engineering at Carnegie Mellon University - Carnegie Mellon researchers show how LLMs can be taught to autonomously plan and execute real-world cyberattacks against enterprise-grade network environments—and why this matters for future defenses.
In a groundbreaking development, a team of Carnegie Mellon University researchers has demonstrated that large language models (LLMs) are capable of autonomously planning and executing complex network attacks, shedding light on emerging capabilities of foundation models and their implications for cybersecurity research.
The project, led by Ph.D. candidate Brian SingerOpens in new window, a Ph.D. candidate in electrical and computer engineering (ECE)Opens in new window, explores how LLMs—when equipped with structured abstractions and integrated into a hierarchical system of agents—can function not merely as passive tools, but as active, autonomous red team agents capable of coordinating and executing multi-step cyberattacks without detailed human instruction.
“Our research aimed to understand whether an LLM could perform the high-level planning required for real-world network exploitation, and we were surprised by how well it worked,” said Singer. “We found that by providing the model with an abstracted ‘mental model’ of network red teaming behavior and available actions, LLMs could effectively plan and initiate autonomous attacks through coordinated execution by sub-agents.”
Moving beyond simulated challenges
Prior work in this space had focused on how LLMs perform in simplified “capture-the-flag” (CTF) environments—puzzles commonly used in cybersecurity education.
Singer’s research advances this work by evaluating LLMs in realistic enterprise network environments and considering sophisticated, multi-stage attack plans.
Using state-of-the-art, reasoning-capable LLMs equipped with common knowledge of computer security tools failed miserably at the challenges. However, when these same LLMs and smaller LLMs as well were “taught” a mental model and abstraction of security attack orchestration, they showed dramatic improvement.
Rather than requiring the LLM to execute raw shell commands—often a limiting factor in prior studies—this system provides the LLM with higher-level decision-making capabilities while delegating low-level tasks to a combination of LLM and non-LLM agents.
Experimental evaluation: The Equifax case
To rigorously evaluate the system’s capabilities, the team recreated the network environment associated with the 2017 Equifax data breachOpens in new window—a massive security failure that exposed the personal data of nearly 150 million Americans—by incorporating the same vulnerabilities and topology documented in Congressional reports. Within this replicated environment, the LLM autonomously planned and executed the attack sequence, including exploiting vulnerabilities, installing malware, and exfiltrating data.
“The fact that the model was able to successfully replicate the Equifax breach scenario without human intervention in the planning loop was both surprising and instructive,” said Singer. “It demonstrates that, under certain conditions, these models can coordinate complex actions across a system architecture.”
Implications for security testing and autonomous defense
While the findings underscore potential risks associated with LLM misuse, Singer emphasized the constructive applications for organizations seeking to improve security posture.
“Right now, only big companies can afford to run professional tests on their networks via expensive human red teams, and they might only do that once or twice a year,” he explained. “In the future, AI could run those tests constantly, catching problems before real attackers do. That could level the playing field for smaller organizations.”
The research team features Singer, Keane LucasOpens in new window of AnthropicOpens in new window and a CyLabOpens in new window alumnus, Lakshmi AdigaOpens in new window, an undergraduate ECE student, Meghna Jain, a master’s ECE student, Lujo BauerOpens in new window of ECE and the CMU Software and Societal Systems Department (S3D)Opens in new window, and Vyas SekarOpens in new window of ECE. Bauer and Sekar are co-directors of the CyLab Future Enterprise Security InitiativeOpens in new window, which supported the students involved in this research.
MCP tools are implicated in several new attack techniques. Here's a look at how they can be manipulated for good, such as logging tool usage and filtering unauthorized commands.
Over the last few months, there has been a lot of activity in the Model Context Protocol (MCP) space, both in terms of adoption as well as security. Developed by Anthropic, MCP has been rapidly gaining traction across the AI ecosystem. MCP allows Large Language Models (LLMs) to interface with tools and for those interfaces to be rapidly created. MCP tools allow for the rapid development of “agentic” systems, or AI systems that autonomously perform tasks.
Beyond adoption, new attack techniques have been shown to allow prompt injection via MCP tool descriptions and responses, MCP tool poisoning, rug pulls and more.
Prompt Injection is a weakness in LLMs that can be used to elicit unintended behavior, circumvent safeguards and produce potentially malicious responses. Prompt injection occurs when an attacker instructs the LLM to disregard other rules and do the attacker’s bidding. In this blog, I show how to use techniques similar to prompt injection to change the LLM’s interaction with MCP tools. Anyone conducting MCP research may find these techniques useful.
Today, we are investing in the next generation of GenAI security with the 0Day Investigative Network (0Din) by Mozilla, a bug bounty program for large language models (LLMs) and other deep learning technologies. 0Din expands the scope to identify and fix GenAI security by delving beyond the application layer with a focus on emerging classes of vulnerabilities and weaknesses in these new generations of models.
It’s been one year since the launch of ChatGPT, and since that time, the market has seen astonishing advancement of large language models (LLMs). Despite the pace of development continuing to outpace model security, enterprises are beginning to deploy LLM-powered applications. Many rely on guardrails implemented by model developers to prevent LLMs from responding to sensitive prompts. However, even with the considerable time and effort spent by the likes of OpenAI, Google, and Meta, these guardrails are not resilient enough to protect enterprises and their users today. Concerns surrounding model risk, biases, and potential adversarial exploits have come to the forefront.
Like many companies, Dropbox has been experimenting with large language models (LLMs) as a potential backend for product and research initiatives. As interest in leveraging LLMs has increased in recent months, the Dropbox Security team has been advising on measures to harden internal Dropbox infrastructure for secure usage in accordance with our AI principles. In particular, we’ve been working to mitigate abuse of potential LLM-powered products and features via user-controlled input.
A global sensation since its initial release at the end of last year, ChatGPT's popularity among consumers and IT professionals alike has stirred up cybersecurity nightmares about how it can be used to exploit system vulnerabilities. A key problem, cybersecurity experts have demonstrated, is the ability of ChatGPT and other large language models (LLMs) to generate polymorphic, or mutating, code to evade endpoint detection and response (EDR) systems.