OWASP, the Open Worldwide Application Security Project, recently announced an update to the Bill of Materials (BOM) industry standard specification, CycloneDX. Of its notable improvements, CycloneDX version 1.5 now supports describing machine learning (ML) models. The enhancement comes at a time of increased excitement around the next generation of AI.
More organizations are training, deploying, or consuming machine learning, such as large language models (LLMs). But transparency into new AI areas is crucial since these models run the risk of becoming compromised by poisoned training data or supply chain attacks on the components they rely upon. Simultaneously, there is growing interest in using software bill of materials (SBOMs) to improve visibility into the dependencies that make up modern software.
Below, I’ll review CycloneDX and peak into the features of its latest incarnation. I also met with Jamie Scott, Founding Product Manager, Endor Labs, which is on the Acceleration Economy Cybersecurity Top 10 Shortlist, to gather more context around the SBOM release and why it matters for the industry. Together, we’ll explore CycloneDX’s role in maintaining transparency throughout tomorrow’s AI-driven software development lifecycle.
Introduction to CycloneDX
For those unfamiliar, OWASP CycloneDX specification is an SBOM format. Supported by many large enterprises and government institutions, CycloneDX has become a well-adopted method extensible to various contexts, such as software, software-as-a-service (SaaS), operations, and manufacturing.
The actual object model is defined in JSON Schema, XML Schema, and Protocol Buffers. It consists of areas such as metadata, components, services, dependencies, compositions, and vulnerabilities. The high-level object model is organized as follows:
Coalescing on an SBOM standard helps the industry build tooling to counter supply chain threats. Some of these capabilities include more seamless SBOM generation and sharing, auditing, and automated vulnerability alerts.
CycloneDX Version 1.5 Introduces ML-BOM
Now that we have a basic understanding of CycloneDX, what’s special about this release? Well, as of v1.5, CycloneDX incorporates machine learning transparency (ML-BOM). This standard introduces a common way to define the training datasets and deployment methods used behind machine learning models. The goal is to increase ML transparency for all stakeholders, from providers to consumers, resellers, and end-consumers.
Specifically, as noted in the updated documentation, “machine-learning-model” is now a possible “component” type. By creating an ML-BOM, relevant technology providers could define loads of metadata, such as version, supplier, copyright, release notes, and more, along with dependency relationships and vulnerabilities, in a standardized way. Accordingly, ML-BOMs can help “provide visibility into possible security, privacy, safety, and ethical considerations.”
The machine learning space is growing rapidly, but we haven’t had the same degree of visibility into data sources and potential vulnerabilities as traditional software. According to Scott, the next logical step is to get visibility into machine learning models holistically, helping consumers make an informed decision on what to use or not to use.
“This release is significant because it sets a more prescriptive direction that tool providers can align to in order to start building the necessary bridges to align the industry with the software transparency movement.”
SBOM Benefits
The biggest issue in the SBOM movement is that data is fragmented significantly across many tools, said Scott. But, the latest CycloneDX release establishes a path to what is appropriate data for an SBOM, helping unite these fragmented ecosystems with more prescriptive information to inform risk management, he said.
Increased transparency into underlying components could also help reduce wasted efforts. For example, Endor Labs’ “State of Dependency Management 2023” report found that 60% of the time developers spend fixing open-source vulnerabilities is wasted because it’s focused on fixing flaws that can’t be exploited in their applications since they’re not reachable. Evidently, greater insight into dependency correlations could help streamline efforts.
Machine-readable SBOMs are also an excellent boon for industry-wide standardization and security compliance automation. Yet, of course, there is still a long way to go to fully realize the benefits of SBOMs. Namely, the industry needs greater maturity and operationalization around SBOM usage, noted Scott. To get there, we need more prescriptive guidelines that request more than the minimum data. Codifying and sharing practices internally and actually acting on found vulnerabilities will be necessary to reap the rewards of SBOMs (and ML-BOMs, for that matter).
The Early Stage of ML and ML-BOMs
We’re still in the early days of machine learning. Both the models themselves and security controls are at a very nascent stage, described Scott. For instance, there is nearly zero vulnerability data for ML models, and the training data that power them is often opaque. This makes determining risk when comparing models to be tricky.
Therefore, incorporating machine learning transparency into the SBOM movement is positive progress in spreading awareness of ML components and related risk information. And, since OWASP CycloneDX is the most widely used BOM format, it makes sense to standardize around this format to improve supply chain risk awareness around ML.