DARPA Challenge Tests AI as Cybersecurity Defenders

The U.S. military wants to see autonomous artificial intelligence find and fix software exploits in just minutes or seconds

Illustration: iStockphoto

Today’s malicious hackers have an average of 312 days to exploit “zero-day” computer software flaws before human cybersecurity experts can find and fix those flaws. The U.S. military’s main research agency focused on disruptive technologies aims to see whether artificial intelligence can do a better job of finding and fixing such exploits within a matter of seconds or minutes.

This summer, seven finalist teams in the Cyber Grand Challenge the U.S. Defense Advanced Research Projects Agency (DARPA) will do battle with AI systems that can autonomously scan rivals’ network servers for exploits and protect their own servers by actively finding and fixing software flaws. The immediate rewards comes in the form of a US $2 million prize for first place, $1 million for second place, and $750,000 for third place. But in the long run, DARPA hopes the challenge results will prove autonomous AI systems have become capable enough to help humans in the never ending struggle to protect computer software and networks.

“They’re going to be doing this live and head to head,” said Mike Walker, program manager of the Cyber Grand Challenge at DARPA, during a teleconference press briefing held on July 13. “We’re hoping to see proof that the entire computer security cycle of responding to flaws can be automated.”

Today’s cybersecurity defenses rely mostly on human teams manually scanning through crash reports to try and pinpoint problems they need to fix; a process that becomes incredibly tedious when computer failures generate hundreds or thousands of bug reports. Some tech giants such as Microsoft have already begun developing software programs that can automatically find problems and generate such bug reports.

But DARPA seeks to raise the bar for how automated AI systems can find and stop exploits. The U.S. military agency hopes such automated systems can greatly speed up the time in which software patches can be developed and applied to fix software vulnerabilities. In addition, automated systems could help monitor entire software systems all the time rather than just rely on human experts to check crash reports when a problem arises.

The seven finalist teams have been given high-performance computing hardware to develop their AI “cyber reasoning” systems. On 4 August, they will step back and watch their AI systems compete against one another during the 10-hour Cyber Grand Challenge finals held in Las Vegas.

When the finals start, each AI competitor will find itself in a network running known protocols and involving previously unexamined computer code. That means the AI systems will need to learn the network’s software language and logic before they begin to figure out vulnerabilities.

To attack rivals, AI competitors will first scan for vulnerabilities in their network servers. If they find a vulnerability, they will have to tell a DARPA referee about the vulnerability and predict when such a software crash might happen. It’s like a billiards player calling a specific shot—such as “seven-ball, corner pocket”—before he or she lines up the pool cue to shoot, Walker said. AI competitors win points by proving such vulnerabilities in rivals and lose points if they cannot protect their own servers against such simulated attacks.

The AI competitors will also have to simultaneously create their own network defenses to protect their servers against rivals. Such cybersecurity defenses can use two tactics, Walker explained. First, you could create a generic “binary armor” that protects against general threats to any part of the network server. The downside of that approach is that it slows the server’s software as it runs its usual computing tasks. “People want computers to be secure, but don’t want it to be secure at the cost of taking two hours to send an email,” Walker said.

The second defense tactic involves finding software flaws and fixing them with specific patches tailored for those problems. Such reverse engineering and “point patching” of a problem is much more challenging for AI competitors. But the AI systems developed by finalist teams did produce such point patches during the previous qualifying rounds of the Cyber Grand Challenge held in June 2015. No individual competitor succeeded in finding all 590 software flaws during the qualifying round, but together they found all of the exploits.

But what if such cybersecurity AI systems could be used for malicious purposes such as finding exploits and launching cyberattacks? As a counter, the Cyber Grand Challenge results and competitor software will be made open-source for the public domain. Open-source software makes it much more likely for any bugs or exploits to have been discovered and fixed by the online community rather than kept hidden from public knowledge. That openness is key to what Walker describes as the “open security revolution” approach to minimizing “nefarious misuse” of such technology.

Walker did not want to speculate about the future capabilities of such AI systems in cybersecurity before the Cyber Grand Challenge finals take place. But he pointed out that any cybersecurity improvements for commercial software would also benefit the U.S. military, given the latter’s general reliance on such off-the-shelf products.

“We’re going to have a science experiment and viability experiment in 21 days, so then I would feel much more ready to speculate,” Walker said. “To be completely honest, we’re trying to prove autonomy before we can say it exists and speculate about future development.”

Even a convincing victory in the DARPA Cyber Grand Challenge will just be the first step for such AI systems. The finalist teams have verbally agreed that the winner among them would take part in a similar “capture the flag” competition against human hackers at DEF CON 2016, one of the world’s largest hacker conferences, being held at the same time at Las Vegas in August. That would mark the first time an autonomous machine plays at the table against human experts.

The Tech Alert Newsletter

Receive latest technology science and technology news & analysis from IEEE Spectrum every Thursday.

About the Tech Talk blog

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.