Background
The information security field is maturing.
But there are still challenges.
The information security field has not matured to the level of other broad reaching institutions.
One particular area of concern for me is in how the industry deals with events that cut across the entirety of the infosec space.
How does the US Centers for Disease Control (CDC) handle, or prepare to handle, large-scale outbreaks?
How does a country like Japan, for example, mature their processes to the point where they can train millions of people how to properly prepare for, and execute an evacuation of an entire city in the event of an earthquake or Tsunami?
The events surrounding the recent DNS issue further exemplifies and illustrates the weaknesses in the security space’s abilities to handle large scale critical security incidents.
My fiance is in the public health field and works closely with US Centers for Disease Control (CDC) on the topic of food safety. They recently dealt with a nation-wide samonila outbreak. I was fortunate to learn a little about how they deal with these events.
Throughout this post, I’ll refer to what I’ve learned about how CDC is able to coordinate disparate groups to accomplish a common objective.
Overview
Off-the-cuff, I came up with a list of the primary phases of large-scale incident handling. I tried to think beyond just the infosec space…
The phases I’ll discuss are:
Discovery
Analysis
Vetting
Damage Control
Recovery and Response
Post-Incident Analysis and Process Enhancement
I’ll discuss what goes into each phase and refer to how the DNS incident highlights where we need to improve.
I’ll end with some remarks and a vision of what I’d like to see happen.
Objective
The objective of this post is to raise the level of discourse beyond the DNS incident, and focus on the future. I want to leverage this incident to take a step forward.
I’ll refer often to how the DNS incident was handled, but the reader should try to avoid interpreting judgement on my part. I’m not trying to debate how person X did Y, or should A have done B. I make references to events only to illustrate how we can learn and improve.
I want the reader to walk away thinking about how we can do better…
Discovery
This one is a bit of a no brainer. This is where an issue is identified.
It can come up as the part of a risk assessment, a vulnerability assessment, incident response, or read on a blog post.
Humorously, in this case, Dan’s discovery was, as he admits, quite by accident.
In the infosec space, we can classify the discoverers into two broad classes:
- Bad guys - for example, those with a vested financial interest in exploiting the vulnerability without the consent of legitimate users.
- Good guys - for example, vulnerability researchers or incident handlers discovering where the bad guys are one step ahead.
Like the security sector, CDC has to deal with threats introduced by good guys and bad guys.
We also both have good guys with a vested interest in “getting to the fix first.”
Now, I’m not trying to claim that the CDC is perfect, but they have something we lack. Coordinated discourse at the national level respecting incident handling.
Discovery, for them, may involve a little of the, “OMG! Did you read about…” But they also take a very proactive approach.
CDC works aggressively with hospitals and institutions to correlate “event data” to determine if there is an outbreak.
This high level of interaction builds collaboration, enhances communication, and keeps all levels aware of the significance of what they are doing. CDC deals with serous stuff. And the DNS vulnerability is the equivalent of a cyber bird-flu.
In our space, there is no focused point of aggregation of vulnerability information. We have public (federal agencies), private, and underground channels. Can we do better?
Using the Risk = Vulnerability * Threat * Consequence equation, discovery results in the identification of a vulnerability.
Analysis
Analysis involves considering the impact of the vulnerability identified in the discovery phase. What’s the threat?
Dan, after doing more research on what he found, pulled together a team of industry insiders, experts, leaders.
I suspect the conversation was a bit like, “I think what I found is huge, here’s the details, what do you think?”
This conversation undermines the entire process if the people in that room can’t be trusted.
At this point, trust enters the equation.
As I discussed earlier, CDC can pull together public officials, industry leaders, and other key players. One of CDC’s big concerns is pannic. For example, it has to consider that if it says “tomatoes have been identified as the source” that there may be farmers that may loose everything as the result of the disclosure and resulting demand reduction for tomatoes. Trust must exist. CDC must trust that people receiving information will handle it properly, responsibly.
In Dan’s case, he was able to pull such a team together.
Though we are starting to develop this capacity as an industry, we definitely need to focus some attention here.
Vetting
Once consensus is reached that the vulnerability carries a reasonable level of threat, vetting allows us to determine consequence.
At this point, additional research might need to be done.
In the DNS case, I’m sure there was consideration of how quickly a remedy could be introduced, how long it would take to formulate a hack once the vulnerability hit the public, and what the consequence would be in the time between exploit and remediation.
In this case, I think the CDC has it a little easy. Historical information can help inform the risk to humans. Lost work, impact on the health infrastructure, potential for death can be analyzed from previous incidents.
In the Security space, we have a bit of a challenge here.
How do you calculate, at a national level, the consequence of the DNS vulnerability? Every organization may have different consequences, and I don’t think there’s really any solid mechanism to statistically analyze cyber security costs at the level needed to inform the analysis.
This situation is further compounded by the lack of institutionalized communication channels.
As an industry we’ve historically struggled with senior management on budgetary issues because consequence is hard to calculate.
This is beginning to change somewhat, but we need to get to a point where we understand the costs to business, and can communicate clearly to representatives who are working at analysis.
In Dan’s case, from what I understand, there was pretty immediate acceptance that the consequence was very high. It may not have needed to be calculated as a dollar figure, but consensus resulted in a good outcome.
We know have the pieces of information needed to determine the overall risk.
Damage Control
Cognizant of the risk, we can develop a reasonable mitigation strategy.
In the DNS case, it involved the coordinated effort of 16 key vendors develop, test, and prepare to deploy patches to over 60 DNS products.
In the CDC case, for example, this is the development of a vaccine.
Consensus must be reached on how the risk mitigation strategy will be implemented. With the DNS vulnerability, the plan was to release patches and announcements simultaneously, and for Dan to do a full disclosure roughly 30 days later.
What Dan and his team did successfully was to come up with a good risk mitigation plan.
In the ideal world, it would have come off without a hitch.
But we’re still maturing, and the plan didn’t work out.
I’m not passing judgement, but I think that we can learn a great deal from how the risk mitigation played out.
So far, this is where I see most of the post-incident analysis focusing. I’m reading a lot of how the disclosure and subsequent events ’should’ have been handled. But We have to look broader…
What we need is an informed process on how to handle recovery and response, and the key players must be bought in.
Recovery and Response
Recovery and response is where we execute our risk mitigation strategy.
When the strategy is “patch”, the infosec space has vastly improved.
Almost all major OS and software vendors have implemented easy update processes that can often be automated.
There are cases where automatic updating is not advisable, and our risk mitigation strategies must consider this.
But a coordinated communication hierarchy can help improve information dissemination. Our system administrator must know that “tonight is a stay late and patch” night.
When handling a virus outbreak, CDC would take this time to issue alerts to hospitals or other health facilities, and begin, if necessary, distribution of vaccines. CDC, for the most part, has these communication channels nailed down.
In my opinion, this phase highlights the communication conundrum in the infosec space. The DNS vulnerability, in my opinion, was a once-in-a-decade if not lifetime event. How did this message get to Joe-Q system administrator?
We need to change and improve.
Post-Incident Analysis and Process Enhancement
At this point in the process, we can take a look at how we performed.
We have the opportunity to review and analyze our processes and procedures and see how we can do better next time.
In my opinion, the IT sector as a whole does a pretty bad job here. Generally we are off to work on the next project.
I think this is a big mistake.
We need to focus attention here.
Conclusion
The recent DNS vulnerability and how it was handled gives the information security sector an excellent opportunity to take a look at how far we’ve come, and where we need to go.
We have smart people in this space. Over the last 3 weeks I’ve read over a thousand blog posts, listened to over 10 podcasts, and done a lot of discussing with colleagues respecting this issue.
What I came away with was, “Damn, there are some smart people here!” And I’m not talking just plain geek smartness. I heard a lot of business savvy, a lot of common sense, and a lot of critical thinking and analysis.
But where can we do better?
The infosec space has some unique characteristics, but there is a lot we can learn from other, more mature, institutions.
We’ve come a long way, and we need to focus on how we develop industry-wide collaboration. IR&R is a great opportunity for us to begin this process.
We should be looking at institutions that do this stuff regularly. What do they do? What can we take away? Lets take a look outside the box.
This stuff may be rocket science to the infosec sector at the national or international level, but in other sectors they’ve already got men on the moon…
I’d like to take a minute to address the issue of those with a vested interest in early disclosure. Penetration testing companies don’t need to be kicked to the curb here. They can bring a lot to the discussion, and can benefit from being at the table at all phases too.
The issue is that responsibility must be to more than the bottom line.
Everyone should be engaged, and I don’t think we should shut anyone out.
We may need to find ways of addressing things like the buying and selling of vulnerabilities, but let’s have an informed discussion.
A vision
I think that a lot of discussion needs to happen, but I do have a vision for where I’d like to see us end up…
What would I like to see?
- A trusted tier-one group of multi-sector representatives that meet frequently, and have a good ability to determine consequence quickly.
- This group has accepted procedures for handling events through the remediation phase and perform post incident analysis.
- A group of mid-tier folks who know the seriousness of what’s coming out of group 1. and have have the tools, and resources needed to hit the ground running.
- This group interfaces regularly with the tier-one group on process analysis and improvement.
- I want to see a broadly known, easy mechanism for vulnerabilities to be introduced to the vetting cycle and be handled appropriately, and I want the process for doing this to be no-risk to the discloser.
I want the obscure, high-school kid who discovers a flaw in a critical system to have a place to go and have his concerns vetted.
I’d like to see the intelligence agencies come to the tier-one group and discuss an issue without the risk that sources might be compromised, or that the information would be mishandled.
At the moment, that’s a pipe-dream :/
The security industry has come a long way, but we have a long way to go.
Keep your eyes on the prize.
Bill