While I’m excited about AI’s potential use cases and value for software development, there are reasons to be concened. AI assistant tools, such as Microsoft Copilot, have been cited as producing up to 50% more code than developers without AI assistance. That is a massively appealing benefit in an industry where incentives such as speed to market, new feature development, and market share often reign supreme over considerations such as security.
However, studies are now showing that not only is AI-generated code full of vulnerabilities, but that it also may be less secure than human-written code. New York University found that 40% of the programs written by Copilot included at least one vulnerability or common vulnerabilities and exposures (CVE). Couple that with the accelerated pace of development, and it is easy to see how potential vulnerabilities could pile up quickly.
This potential risk shouldn’t be a massive surprise because these tools are trained on large data sets, including coding samples, many of which have inherent flaws and vulnerabilities. As a result, it’s expected that the GenAI-produced code might inherit these vulnerabilities, reflecting the issues present in the training data.
This is concerning because many organizations already have vulnerability backlogs in the hundreds of thousands or millions, so adding exponentially more production code with vulnerabilities could be a death knell for security teams trying to keep up with existing backlogs.
Speed vs. Safety
Developers have been shown to inherently trust GenAI code outputs despite being aware they may contain vulnerabilities or introduce organizational risks. This is because they are incentivized to produce software quickly, not move slowly and be diligent.
Ask Cloud Wars AI Agent about this analysis
Some are okay with proceeding quickly because GenAI will also help identify and resolve vulnerabilities and defects in code. While that sounds good in theory, so far the results don’t seem to align with that aspirational goal. Purdue University conducted a study and found that ChatGPT was wrong over half the time when it came to answering questions about computer progrmming. When I consider that most teams are already struggling with poor, low-fidelity findings from security tools, including false positives, adding even more noise doesn’t seem like a great idea — yet that’s exactly where we’re headed.
It is possible and likely that GenAI and copilot tools will improve over time in both their ability to produce more secure code and identify vulnerabilities in code that they write or scan. This, however, will take time, training, patience, and investment.
Conclusion
The tech industry, like most enterprise customers, is already moving out using these tools. This means while the tools are still evolving and have to improve, countless lines of code and applications are being produced with potential vulnerabilities, and embedded into enterprises and products while being exposed to external customers.
It will be interesting to look back in 12 to 24 months on security incidents and vulnerability exploitation to see if there are any insights with regard to what incidents were influenced by AI-generated code. For now, we don’t have that visibility but we do know that GenAI and copilot coding tools aren’t a panacea: While they may be speeding us up, we have lingering doubts whether that speed comes at the expense of security.
The AI Ecosystem Q2 2024 Report compiles the innovations, funding, and products highlighted in AI Ecosystem Reports from the second quarter of 2024. Download now for perspectives on the companies, innovations, and solutions shaping the future of AI.