Most organizations have an incident response plan. Fewer organizations have one that anyone’s actually read in the last six months. Even fewer have one that’s been tested against something resembling a real attack.
And this obliviousness is where breaches turn into disasters.
The difference between a contained incident and a full-scale crisis usually isn’t the sophistication of the attacker. It’s whether the defending team knew what they were doing before things went sideways. This article breaks down the five essential steps of an incident response plan, explains why each phase matters, drawn from frameworks like NIST and real-world enterprise experience.
Step 1: Preparation: Everything You Do Before Anything Goes Wrong
Preparation involves running exercises and making sure your tools are actually integrated rather than just technically installed.
But here’s what skipping it costs you: when an alert fires at an inconvenient hour, your analysts are improvising. Who makes the call to escalate? Who owns external communication? Where’s the playbook for this specific type of incident? If your team is answering those questions in real time, you’ve already lost ground.
Good preparation means defined roles that people actually know they have. It means tabletop exercises where you run through scenarios and your team disagrees about what to do because that friction is exactly what you want to surface in a drill, not during an active incident. It means your SIEM, EDR, and NDR tools are configured to work together, not just coexist.
A lot of mature security programs now bring in a third-party IR retainer as part of this phase. Teams like NetWitness IR get embedded in your preparation work, so if something happens, you’re not onboarding a stranger mid-crisis. That integration matters more than most people realize until they’re in the middle of something urgent.
Step 2: Identification: Figuring Out What’s Real
Alert fatigue is one of the most underreported problems in security operations. Analysts face an overwhelming number of alerts every hour. Due to the fatigue, the alerts start blending together, and that’s exactly when the critical ones start getting missed.
Identification is the discipline of separating noise from signal. Is this behavior unusual for this specific user on this specific system? Have we seen this traffic pattern before? Does this indicator match anything in our threat intelligence?
Ask any analyst who’s handled a serious incident well and they’ll tell you the same thing: they could see what was happening across their environment without jumping between tools. Network traffic, endpoint behavior, cloud activity, all of it in one place. When something genuinely wrong is happening, it tends to show up across multiple sources at once. If you’re toggling between dashboards to piece that together manually, you’re already behind.
Also, an experienced analyst will advise you to keep notes as you go. Not a formal write-up, just running timestamped notes. What you checked, what looked normal, what you dismissed, and why. In the middle of an incident, it feels like unnecessary admin when you have actual fires to deal with. But memories get fuzzy fast, decisions that made complete sense at the time become impossible to reconstruct a few weeks later, and if there’s ever a legal angle to the incident, those notes become very important very quickly.
Step 3: Containment: Move Fast, But Think First
The reflex when you confirm an active incident is to start isolating everything immediately. Sometimes this response is right, but sometimes it is not.
If an attacker knows you’ve spotted them before you understand what they’ve accessed, you lose the ability to gather intelligence that could help you fully eradicate them. In certain scenarios, a brief period of monitored access gives you far more than immediate lockdown. This is a judgment call that depends heavily on the nature of the incident, and it’s the kind of call that’s much easier to make correctly when you’ve thought through it in advance rather than in the moment.
Whatever you decide on the strategy side, the technical basics don’t change. Get the infected systems off the network so the damage stops spreading. Put restrictions in place at the network level. And before anyone touches a compromised machine or before you clean it, reimage it, or do anything to it, pull the volatile memory and logs first.
That last one gets skipped constantly because the pressure to restore is enormous. But a wiped system is a silent system. Whatever the attacker did, whatever they left, whatever path they took through your environment will be gone. You’ll be doing your post-incident review with a blindfold on.
Step 4: Eradication and Recovery: Cleaning Takes Longer Than You Want It To
This phase gets rushed more than any other, and it shows in form of re-infections, residual access, and incidents that get “resolved” twice.
Eradication isn’t just removing the malware. You need to trace the whole thing back, where did they get in, what accounts did they touch or create, did they leave anything behind that survives a reboot, what data was in reach. Without that picture, you’re not really cleaning up an incident. You’re just hoping you got everything.
When it comes to recovery, the temptation is to clean a compromised server in place and move on. Rebuild it instead. Close the gaps attackers came through before anything goes back online, not after.
Once systems are running again, keep eyes on them. A reimaged system going back to normal behaviour is reassuring. A reimaged system doing something slightly odd three days later is a problem you want to catch early.
One thing technical teams consistently underestimate in this phase: leadership needs updates too. Let it include here’s what we’ve resolved, here’s what’s still open, here’s what it means for the business. That conversation is worth having before they come looking for it.
Step 5: Post-Incident Review: The Phase That Actually Prevents the Next One
The post-incident review is where organizations actually get better. Not because someone writes a report, but because the people who were in it sit down together and work through what slowed the response, where visibility was missing, which decisions were made on incomplete information, and what the playbooks got wrong. Then you fix those things.
Update your detection rules based on the TTPs you encountered. Revise your playbooks for the steps that didn’t hold up. Run training on the gaps your team identified. If you treat every incident as a data point rather than just a problem to close, your response capability compounds over time.
In Conclusion
Incident response isn’t really about what you do when an attack happens. It’s about what you’ve built before it does. The actual work is during the preparation, the exercises, the integrations, and the reviews.
When every second counts, visibility and coordination matter most. That’s where intelligent detection, extensive investigation, and expert-led incident response converge to make a measurable difference.
