In episode 94 of the Cybersecurity Minute, Rob Wood expands upon the importance of data quality for machine learning models in cybersecurity.
Highlights
00:16 — If you’re looking at building in-house machine learning models to complement and scale your security program’s capabilities, you’re going to need to think carefully and intentionally about the data that you use to train these models. You want to make sure you have clean data.
Which companies are the most important vendors in cybersecurity? Check out
the Acceleration Economy Cybersecurity
Top 10 Shortlist.
00:53 — With clean data, you’re giving the model a proper range of inputs and insights to look at and also making sure that data is not going to unintentionally bias or deceive it
01:27 — Rob is a fan of the shift towards the security data lake. Cybersecurity needs to be able to bring together a lot of data, cross-reference it, and join datasets together. That’s not possible with dated data tools. But some of these newer data lake technologies out there: Snowflake, Databricks, Confluent, etc., are opening up exciting possibilities.
02:16 — If you’re building engineering capability in-house to do this work, make sure that you do not skimp on the architecture, process, data flows, or data sourcing. Make sure you get that foundational stuff right. You will be in a much better position to succeed.
Want more cybersecurity insights? Visit the Cybersecurity channel: