Practices such as tabletop exercises simulate stressor events like data breaches without all the real-world consequences, but as the Marvin Gaye song goes, “Ain’t Nothing Like the Real Thing.”
Though it may not be the learning opportunity anyone craves, an actual stressor event forces companies to seek substantive answers and solutions to ensure that the security failure that caused the data breach or stressor event, whether it was systemic or a point-in-time instance, does not occur again.
There are immediate steps to be taken following a breach, some of which are captured in a recent analysis by my colleague Frank Domizio. The initial days to weeks following the event should rightfully be focused on immediate containment, recovery, and continuity efforts.
After that time, there are tremendous opportunities to learn what really happened and how it can be prevented in the future. This analysis explores different approaches to analyzing security control failures and, ultimately, using that knowledge to fortify defenses. I’ll cover a practice called 5 Whys, Black Box Analysis, and Retrospectives.
5 Whys
The 5 Whys technique is a method of root cause analysis, often used in the context of post-mortem reviews. This is carried out by continuously asking the question “Why?” to dig deeper into the root cause. When conducting a 5 Whys analysis, things can quickly sprawl. One reason turns into 5, 5 things turn into 15, and so on.
This is what the 5 Whys — along with potential answers — might look like in practice, using the example of an exploited structured query language (SQL) injection vulnerability that resulted in a security incident:
- Why was there an undetected vulnerability in the production version of the web application? It was not detected in recently conducted penetration tests.
- Why was the vulnerability not identified in recent tests? Penetration tests were all short, time-boxed efforts that didn’t get coverage over the areas of the application that were vulnerable.
- Why were the tests all time-boxed? Doing long penetration tests was significantly more expensive than other forms of vulnerability identification.
- Why didn’t the team spend budget on other forms of vulnerability identification like static analysis? The team didn’t have the resources available to support this kind of analysis.
- Why can’t the team members support other forms of analysis? Nobody has experience in these areas, and we have not yet invested in training along these lines.
Whoever is leading the postmortem process should decide how broad the investigation should go. I believe there’s value in an initial, very broad sweep to identify as many potential failure conditions as possible. Then, follow that up with more focused investigations into the root causes, going as deep into the whys as you can, perhaps focusing where there is more control over a potential fix. Ultimately, understanding failure modes or risks needs to be paired with an intention and plan to fix them.
Black Box Analysis
The concept of black box analysis originates from the aviation industry. There, black box analysis is an intense, data-driven study of the telemetry captured by an airplane’s black box after any failure or loss event. This kind of analysis could be categorized as a data-driven postmortem, looking for correlations in the data available that may have been contributing factors to the failure event.
The cybersecurity equivalent of black box thinking take into consideration the following types of data:
- Leading indicator metrics such as patch frequency; privilege sprawl across the environment; vulnerability reports from outside the security team; and level of developer engagement, to name a few
- Security telemetry occurring just prior to, during, and following the security event.
- Market or regulatory dynamics occurring around the time of the incident.
Retrospectives
Retrospectives are a tool often used in agile software development, but they have utility in a security setting as well. Retrospectives are a group activity, pulling together different team members to identify thematic issues around what worked well and what didn’t work well. A retrospective will likely surface a diverse set of opinions and perspectives about failures. While there are many different approaches to running a retrospective, I have always found these practices to be helpful:
- Organize input from team members by theme (either pre-defined or made up as input comes in)
- Assign owners to do further investigation
- Review whether the right people were in the room participating and if more structured learning needs to happen
Using pre-defined prompts during the retrospective can help structure the feedback that is collected. Some relevant retrospective themes for cybersecurity include:
- Tools/tech
- Business processes
- Communication within the security team
- Communication with other teams
- Reliance on third parties
Concluding Thoughts
There are many ways to structure post-breaching learning beyond the three techniques outlined in this analysis. However, the ones explained above provide an excellent starting point. Understanding more deeply what went wrong, and how it went wrong with regard to a security incident, can help you adapt your environment and your team to prevent similar incidents in the future.