blog.trailofbits.com - Now that DARPA’s AI Cyber Challenge (AIxCC) has officially ended, we can finally make Buttercup, our CRS (Cyber Reasoning System), open source!
We’re thrilled to announce that Trail of Bits won second place in DARPA’s AI Cyber Challenge (AIxCC)! Now that the competition has ended, we can finally make Buttercup, our cyber reasoning system (CRS), open source. We’re thrilled to make Buttercup broadly available and see how the security community uses, extends, and benefits from it.
To ensure as many people as possible can use Buttercup, we created a standalone version that runs on a typical laptop. We’ve also tuned this version to work within an AI budget appropriate for individual projects rather than a massive competition at scale. In addition to releasing the standalone version of Buttercup, we’re also open-sourcing the versions that competed in AIxCC’s semifinal and final rounds.
In the rest of this post, we’ll provide a high-level overview of how Buttercup works, how to get started using it, and what’s in store for it next. If you’d prefer to go straight to the code, check it out here on GitHub.
How Buttercup works
Buttercup is a fully automated, AI-driven system for discovering and patching vulnerabilities in open-source software. Buttercup has four main components:
Orchestration/UI coordinates the overall actions of Buttercup’s other components and displays information about vulnerabilities discovered and patches generated by the system. In addition to a typical web interface, Buttercup also reports its logs and system events to a SigNoz telemetry server to make it easy for users to see what Buttercup is doing.
Vulnerability discovery uses AI-augmented mutational fuzzing to find program inputs that demonstrate vulnerabilities in the program. Buttercup’s vulnerability discovery engine is based on OSS-Fuzz/Clusterfuzz and uses libFuzzer and Jazzer to find vulnerabilities.
Contextual analysis uses traditional static analysis tools to create queryable program models that are used to provide context to AI models used in vulnerability discovery and patching. Buttercup uses tree-sitter and CodeQuery to build the program model.
Patch generation is a multi-agentic system for creating and validating software patches for vulnerabilities discovered by Buttercup. Buttercup’s patch generation system uses seven distinct AI agents to create robust patches that fix vulnerabilities it finds and avoid breaking the program’s other functionality.
aicyberchallenge.com - Teams’ AI-driven systems find, patch real-world cyber vulnerabilities; available open source for broad adoption
A cyber reasoning system (CRS) designed by Team Atlanta is the winner of the DARPA AI Cyber Challenge (AIxCC), a two-year, first-of-its-kind competition in collaboration with the Advanced Research Projects Agency for Health (ARPA-H) and frontier labs. Competitors successfully demonstrated the ability of novel autonomous systems using AI to secure the open-source software that underlies critical infrastructure.
Numerous attacks in recent years have illuminated the ability for malicious cyber actors to exploit vulnerable software that runs everything from financial systems and public utilities to the health care ecosystem.
“AIxCC exemplifies what DARPA is all about: rigorous, innovative, high-risk and high- reward programs that push the boundaries of technology. By releasing the cyber reasoning systems open source—four of the seven today—we are immediately making these tools available for cyber defenders,” said DARPA Director Stephen Winchell. “Finding vulnerabilities and patching codebases using current methods is slow, expensive, and depends on a limited workforce – especially as adversaries use AI to amplify their exploits. AIxCC-developed technology will give defenders a much-needed edge in identifying and patching vulnerabilities at speed and scale.”
To further accelerate adoption, DARPA and ARPA-H are adding $1.4 million in prizes for the competing teams to integrate AIxCC technology into real-world critical infrastructure- relevant software.
“The success of today’s AIxCC finalists demonstrates the real-world potential of AI to address vulnerabilities in our health care system,” said ARPA-H Acting Director Jason Roos. “ARPA-H is committed to supporting these teams to transition their technologies and make a meaningful impact in health care security and patient safety.”
Team Atlanta comprises experts from Georgia Tech, Samsung Research, the Korea Advanced Institute of Science & Technology (KAIST), and the Pohang University of Science and Technology (POSTECH).
Trail of Bits, a New York City-based small business, won second place, and Theori, comprising AI researchers and security professionals in the U.S. and South Korea, won third place.
The top three teams will receive $4 million, $3 million, and $1.5 million, respectively, for their performance in the Final Competition.
All seven competing teams, including teams all_you_need_is_a_fuzzing_brain, Shellphish, 42-beyond-bug and Lacrosse, worked on aggressively tight timelines to design automated systems that significantly advance cybersecurity research.
Deep Dive: Final Competition Findings, Highlights
In the Final Competition scored round, teams’ systems attempted to identify and generate patches for synthetic vulnerabilities across 54 million lines of code. Since the competition was based on real-world software, team CRSs could discover vulnerabilities not intentionally introduced to the competition. The scoring algorithm prioritized competitors’ performance based on the ability to create patches for vulnerabilities quickly and their analysis of bug reports. The winning team performed best at finding and proving vulnerabilities, generating patches, pairing vulnerabilities and patches, and scoring with the highest rate of accurate and quality submissions.
In total, competitors’ systems discovered 54 unique synthetic vulnerabilities in the Final Competition’s 70 challenges. Of those, they patched 43.
In the Final Competition, teams also discovered 18 real, non-synthetic vulnerabilities that are being responsibly disclosed to open source project maintainers. Of these, six were in C codebases—including one vulnerability that was discovered and patched in parallel by maintainers—and 12 were in Java codebases. Teams also provided 11 patches for real, non-synthetic vulnerabilities.
“Since the launch of AIxCC, community members have moved from AI skeptics to advocates and adopters. Quality patching is a crucial accomplishment that demonstrates the value of combining AI with other cyber defense techniques,” said AIxCC Program Manager Andrew Carney. “What’s more, we see evidence that the process of a cyber reasoning system finding a vulnerability may empower patch development in situations where other code synthesis techniques struggle.”
Competitor CRSs proved they can create valuable bug reports and patches for a fraction of the cost of traditional methods, with an average cost per competition task of about $152. Bug bounties can range from hundreds to hundreds of thousands of dollars.
AIxCC technology has advanced significantly from the Semifinal Competition held in August 2024. In the Final Competition scored round, teams identified 77% of the competition’s synthetic vulnerabilities, an increase from 37% at semifinals, and patched 61% of the vulnerabilities identified, an increase from 25% at semifinals. In semifinals, teams were most successful in finding and patching vulnerabilities in C codebases. In finals, teams had similar success rates at finding and patching vulnerabilities across C codebases and Java codebases.