Over the past two years, the open source curl project has been flooded with bogus bug reports generated by AI models.
The deluge prompted project maintainer Daniel Stenberg to publish several blog posts about the issue in an effort to convince bug bounty hunters to show some restraint and not waste contributors’ time with invalid issues.
Shoddy AI-generated bug reports have been a problem not just for curl, but also for the Python community, Open Collective, and the Mesa Project.
It turns out the problem is people rather than technology. Last month, the curl project received dozens of potential issues from Joshua Rogers, a security researcher based in Poland. Rogers identified assorted bugs and vulnerabilities with the help of various AI scanning tools. And his reports were not only valid but appreciated.
Stenberg in a Mastodon post last month remarked, “Actually truly awesome findings.”
In his mailing list update last week, Stenberg said, “most of them were tiny mistakes and nits in ordinary static code analyzer style, but they were still mistakes that we are better off having addressed. Several of the found issues were quite impressive findings.”
Among them was an out-of-bounds read in Kerberos5 FTP that was deemed not to be a security vulnerability but was addressed nonetheless.
Stenberg told The Register that about 50 bugfixes based on Rogers’ reports have been merged.
“In my view, this list of issues achieved with the help of AI tooling shows that AI can be used for good,” he said in an email. “Powerful tools in the hand of a clever human is certainly a good combination. It always was!
“I don’t think it has changed my views much on AI, other than perhaps proving that there are some really good AI powered code analyzer tools.
“It perhaps also just underscores how silly the reports are that we still get from the most naive users.”
There it is. Artificial intelligence tools, when applied with human intelligence by someone with meaningful domain experience, can be quite helpful – surprising, even.
Rogers wrote up a summary of the AI vulnerability scanning tools he tested. He concluded that these tools – Almanax, Corgea, ZeroPath, Gecko, and Amplify – are capable of finding real vulnerabilities in complex code.
“These types of systems are probably going to be the most influential, if not interesting and effective technology for finding vulnerabilities in the near future, of the kind that has not been seen since around 2013 when fuzzing became popular again with afl-fuzz,” he wrote.
These results should not be too surprising, given that Google’s OSS-Fuzz project has said that LLM-assisted bug hunting is effective. But the message is more relatable coming from an individual security researcher as opposed to an AI vendor with enormous resources.
Rogers had particularly high praise for ZeroPath.
“In scanning open source software, it literally found hundreds of real vulnerabilities and bugs in very critical software: sudo, libwebm, Next.js, Avahi, hostap, curl, Squid (not so critical, but it did literally find over 200 real bugs),” he wrote. “Yes, finally, AI found real bugs in curl! Indeed, not only did ZeroPath find a plethora of vulnerabilities, it was intimidatingly good at finding normal bugs, when given a custom rule to do so.”
Even so, these tools have limitations. Rogers said that none of the AI scanners were able to catch a previously identified infinite loop bug in the image-size npm package. The vulnerability, he said, remained unfixed as of last month despite a patch submitted in April. That too is a people problem. ®