Incident Response Fundamentals -- T34ch Tech

Your first real incident will not feel like the textbook. The alert will come in at an inconvenient time. The affected system will be one nobody documented. The person on the phone will be a VP who wants a straight answer about whether customer data is gone. You will have three browser tabs open, a Slack thread moving faster than you can read, and a sinking feeling that you are missing something.

The analysts who do well in that moment are not the ones who memorized a framework. They are the ones who built habits -- specific, repeatable behaviors that produce useful output even when the situation is chaotic.

This article covers those habits. Not the NIST framework as an org chart. The actual decisions you face: what to triage first, what evidence to collect before you touch anything, when to contain versus when to watch, and how to communicate with people who are frightened and need facts you do not yet have.

The Lifecycle Is Not Linear

Every IR curriculum presents the same six phases in a neat circle: Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned. That circle is a useful mental model. It is not what a response looks like in practice.

In practice, you are often running identification and containment simultaneously. You may cycle back through identification three times as new hosts turn up. Eradication gets interrupted by a second incident. Lessons learned happens six weeks late in a meeting that half the attendees skip.

Understanding the phases matters because each one has a different primary objective and a different set of mistakes that derail it. The phases exist to keep you honest about which question you are currently trying to answer.

Fig. 01 -- The IR lifecycle in practice

The standard IR lifecycle. In practice, identification often re-opens as scope expands. Containment and identification frequently run in parallel. The cycle back through preparation is the most commonly skipped step.

The mistake most junior analysts make is treating phase transitions as gates that close behind them. You do not graduate out of identification. You continue identifying throughout the incident. You just stop letting new findings delay containment indefinitely.

Identification: The Question That Has to Stay Open

An alert is not an incident. A ticket is not a confirmation. Identification is the process of answering one question: is something actually happening, and if so, what?

This sounds straightforward. It is not. The pressure during identification runs in two directions simultaneously. Business stakeholders want a fast answer. Your instinct as an analyst is to not declare an incident until you are sure. Both pressures push you toward mistakes.

Declaring too fast means spinning up a full response for a false positive. Declaring too slow means the threat has more time to move.

The practical resolution is to use a severity threshold for declaration, not a certainty threshold. You do not need to know what is happening to declare an incident. You need enough evidence that something is happening to justify escalating resources. Document what you know, what you suspect, and what you cannot yet rule out. Declare when the evidence warrants resources, not when you have certainty.

Key term: Dwell time The time between an attacker's initial access and your detection of that access. Industry median is measured in weeks. Every day of undetected presence is another day of lateral movement, credential harvesting, and exfiltration. Fast identification compresses dwell time. This is the single most operationally significant metric in IR.

Scoping: Your First Job After Declaration

Once you have declared, your first task is not to start fixing things. It is to understand how large the problem is.

Scope determines resource allocation, communication requirements, regulatory obligations, and containment strategy. A single compromised workstation requires a different response than a compromised domain controller. You cannot know which you have until you scope.

Scoping questions in rough priority order:

What is the initial indicator? (Alert, report, external notification, ransom note.) What systems are confirmed affected? What is the earliest evidence of activity -- what is the detection timestamp versus what the logs suggest for first access? What accounts have touched affected systems? What does lateral movement look like -- are there other systems with authentication events from those accounts in the same window?

Do not scope by asking affected users. Scope by looking at the data. Users are unreliable reporters of what happened on their machines.

Triage: Making Decisions With Incomplete Information

Triage is not a phase in the lifecycle. It is a continuous activity that runs from your first alert through recovery. Every piece of evidence, every new system that turns up, every request from the business -- all of it requires a triage decision.

The core triage problem is that you will always have more things that need attention than you have time or people to address them. Triage is the discipline of sequencing correctly under that constraint.

Fig. 02 -- Triage priority matrix

Triage by two axes: how severe if true, and how certain you are that it is true. Most of your queue lives in the upper-left -- high-potential, uncertain. That quadrant requires fast investigation, not fast containment.

The mistake analysts make in the upper-right quadrant -- confirmed high severity -- is pausing to gather more information before acting. At that point you have enough. Act. You can continue investigating in parallel with containment.

The mistake in the upper-left quadrant is the opposite: treating uncertainty as a reason to delay. Uncertainty is not a reason to delay investigation. It is a reason to prioritize investigation.

Severity Signals That Actually Predict Impact

Not all alert fields are equally useful for triage. These are the signals that consistently correlate with real severity:

Privilege level of the affected account. A domain admin or service account is categorically different from a standard user. Credential-based attacks always try to escalate. If the compromised account has elevated privileges, your scope estimate goes up immediately.

Position of the affected host in the network. A domain controller, a backup server, an authentication server, or anything in the path of lateral movement is a multiplier. An endpoint in an isolated subnet is not.

Data classification of what the host touches. What is on that system or accessible from it? This drives your notification and regulatory obligations as much as it drives technical response.

Recency and velocity of the activity. An attacker who has been quiet for six weeks is different from one who exfiltrated 40 GB three hours ago. Velocity tells you whether you are in an active attack or a post-compromise investigation.

Key term: Blast radius The total scope of systems, accounts, and data that a threat actor could have reached from their initial access point, given the privileges and network position they obtained. Estimating blast radius during triage gives you a worst-case scope before you have confirmed anything beyond the initial host.

Evidence Preservation: Touch Nothing You Have Not Documented

This is where junior analysts cause the most damage, and it is almost always unintentional.

The instinct when you find a compromised system is to start working on it -- pulling logs, running tools, looking for artifacts. That instinct will destroy evidence. Every action you take on a live system modifies it. File access times change. Memory contents shift. If that system becomes part of a legal case, chain of custody breaks down before you have established it.

The rule is: document the state before you change the state.

This does not mean you need perfect forensic process for every machine in every incident. It means you need to make a deliberate decision about evidence handling before you touch anything, not after.

Fig. 03 -- Evidence collection sequence

Evidence volatility sequence. Always collect from most volatile to least. RAM contents are gone the moment power is cut. Logs can rotate within hours. Work top to bottom, not by convenience.

What "Preserving Evidence" Actually Means in Practice

For most incidents at most organizations, full forensic acquisition of every affected system is not realistic. You do not always have the tools, the storage, the time, or the trained personnel. That does not mean evidence handling does not matter. It means you triage your preservation effort the same way you triage everything else.

For systems that are likely to end up in a legal or regulatory proceeding, do it right: bit-for-bit image, cryptographic hash (SHA-256) of the image, chain of custody documentation, write-blocker on any physical media.

For systems where legal action is unlikely, a minimum viable approach still applies: screenshot the running state before you touch anything, export logs to a location you control, document every command you run on the system with a timestamp.

The hash is non-negotiable regardless of tier. If you cannot prove the evidence was not modified after collection, it is not evidence.

Before you run any command on an affected system, write down what you are about to do and why. After you run it, write down what it showed. This log is your forensic record. Without it, you cannot reconstruct the state of the system at time of discovery, and you cannot defend your analysis to anyone who was not in the room.

Containment: Deciding When and How to Cut the Cord

Containment is operationally the most consequential decision in the incident. Done too early, you tip off the attacker, lose visibility, and potentially destroy forensic artifacts. Done too late, you extend dwell time and give the attacker more time to achieve their objective.

There is no universal right answer. There is a framework for thinking through the decision.

Short-Term vs Long-Term Containment

These are not the same decision, and conflating them is a common mistake.

Short-term containment is the immediate action to stop the bleeding. Block the C2 IP at the perimeter. Disable the compromised account. Isolate the affected host from the network. These actions are fast, reversible, and buy time. They do not remove the threat.

Long-term containment is the more durable posture you establish while eradication is underway. Patched and hardened builds. Rebuilt accounts with new credentials. Segmentation rules. Monitoring tuned to detect the specific TTPs you observed.

You will almost always need both. Implement short-term containment to stop the immediate harm, then work toward long-term containment before you can safely declare eradication.

Fig. 04 -- Containment decision tree

Containment decision flow. The observation branch exists -- sometimes watching an attacker yields intelligence that improves your response. That decision requires explicit authorization from leadership and legal, not analyst discretion.

The Attacker-Awareness Problem

One thing that changes your containment calculus significantly is whether you believe the attacker knows they have been detected.

If they do not know: you have the option to observe. You may learn their objectives, their tooling, their other footholds. This intelligence can dramatically improve your eradication completeness.

If they do know, or you cannot rule it out: they are already accelerating. Ransomware deployment, data staging, destruction of logs. Observation loses its value and containment becomes urgent.

In practice, assume they know or will shortly know. Treat the observation option as exactly that -- an option, not a default. Exercise it only with explicit authorization from legal and leadership, with a clear time limit and defined trigger conditions for cutting the connection anyway.

Communication Under Pressure

The technical work of IR is hard. The communication work is harder, and it has more organizational consequences when it goes wrong.

Junior analysts tend to either over-communicate (sending updates that raise alarm without adding information) or under-communicate (going dark while they work, leaving stakeholders to assume the worst). Both create problems.

The Stakeholder Map

Before you can communicate well, you need to know who your audience is and what they actually need to know.

Fig. 05 -- Stakeholder communication matrix

Different stakeholders need different things. Giving executives technical detail does not inform them -- it creates noise and erodes confidence. Giving IT operators business context delays action. Tailor every communication to what the recipient needs to do with the information.

What to Say When You Do Not Have Answers

The most common communication failure is silence under uncertainty. You do not know the scope yet, so you say nothing. The executive hears nothing, assumes the worst, and either escalates past you or starts making decisions without information.

The antidote is a structured uncertainty statement. Every status update has four components:

What is confirmed. Only what evidence supports. No hedging, no "it might be." Facts.

What is suspected but not confirmed. What the evidence suggests but cannot yet prove. Flag this explicitly as unconfirmed.

What is not yet known. The open questions. This tells the recipient what to expect in the next update.

What is happening next and when. One or two specific actions, with a timeframe. This gives the recipient a reason to wait for your next update rather than escalating.

Example of what this looks like in practice:

Confirmed: One workstation in Finance (hostname WS-FIN-044) shows evidence of credential harvesting tooling running as the logged-on user's account. User account has been disabled. System is isolated from the network.

Suspected: The same credentials may have been used to authenticate to the VPN. We are pulling authentication logs now.

Unknown: Whether any data was staged or exfiltrated. Whether other systems were accessed using the harvested credentials.

Next: Authentication log review complete within 2 hours. Will update immediately if additional systems are confirmed affected.

That is not a long message. It takes five minutes to write. It prevents four hours of executive escalation.

Key term: Privilege in incident communications In many jurisdictions, incident-related communications that are conducted at the direction of legal counsel may qualify for attorney-client privilege, protecting them from disclosure in subsequent litigation or regulatory investigations. Loop in legal early, and follow their guidance on what to document in privileged channels versus operational records. This is not a reason to hide information -- it is a reason to structure communications deliberately.

The Habit Stack

Everything above consolidates into a set of habits. Not steps in a process -- habits. Things you do reflexively, before the chaos sets in.

When you receive an alert: write down the time, the indicator, and what it touches before you do anything else. That note is the beginning of your case record.

When you access an affected system: document what you are about to do, run the command, document what it returned. Every command, every output.

When a stakeholder asks for a status: give the four-component update even if it feels premature. Silence is not professionalism. It is abdication.

When you think you have found everything: assume you have not. Ask what else could have the same level of access and go look for it.

When you close an incident: write the post-incident report within two weeks. Not as a formality. Because the findings go directly into your preparation for the next one.

IR is not a technical discipline with a communication requirement bolted on. It is a decision discipline conducted under time pressure, with imperfect information, that produces both technical and organizational outcomes. The analysts who consistently perform well treat communication, documentation, and evidence handling as core skills -- not overhead.

The tools change. The platforms change. The attackers get more sophisticated. The fundamentals do not.