Dwell Time as a Security Metric -- T34ch Tech

Every annual threat report publishes a dwell time figure. The number goes down a little each year, the vendor who published it takes credit, and security programs everywhere update their board decks with a benchmark they did not collect and cannot verify. The median global dwell time is now under three weeks. That number is both correct and almost entirely useless for improving your detection program.

The problem is not dwell time as a concept. The time between initial compromise and detection is genuinely important. The problem is what organizations do with the number, how they collect it, and what they think it tells them about their security posture. In practice, dwell time as commonly reported conceals more than it reveals, and the ways it misleads are not obvious unless you think carefully about the sampling methodology and the operational context.

This article is about those concealed problems. Not a critique of measuring detection speed -- you should measure detection speed. Rather, a working analysis of why the standard approach produces metrics that are easy to report and hard to act on, and what to measure instead if you want numbers that actually change operational behavior.

What Dwell Time Actually Is

Start with a precise definition, because the term gets used loosely.

Key term: Dwell time The elapsed time between an attacker's initial access to an environment and the moment that access is detected by the defending organization. This includes detection by any means: automated alerting, human investigation, third-party notification, law enforcement contact, or the attacker revealing themselves (as in ransomware deployment). Dwell time does not measure time to containment or time to remediation -- only time to awareness.

Two related metrics travel with dwell time in most reporting:

MTTD -- Mean Time to Detect. The arithmetic mean of dwell times across a set of incidents. This is the number most commonly reported at the organizational level. It is also the most misleading, for reasons we will cover shortly.

MTTR -- Mean Time to Respond. The elapsed time from detection to containment, or sometimes from detection to full remediation, depending on who is defining it. MTTR and MTTD are often presented together as complementary metrics. They are not complementary. They measure different capabilities, are influenced by different variables, and improving one does not imply improving the other.

The distinction matters operationally. A team that detects quickly but contains slowly has a detection capability and a response problem. A team that detects slowly but contains quickly once they see it has a visibility gap and functional playbooks. The correct investment differs in each case, but MTTD and MTTR presented side by side without context obscure that distinction.

Fig. 01 -- The dwell time timeline

Dwell time measures the gap between compromise and awareness. Everything the attacker accomplishes during that window -- lateral movement, persistence, data staging, exfiltration -- happens before the clock starts on your response.

Why the Mean Is the Wrong Central Tendency

The arithmetic mean of dwell times is the single most commonly reported number and the single worst way to summarize the distribution.

Dwell time distributions are not normal. They are heavily right-skewed. Most detected intrusions cluster in a relatively narrow window -- a few days to a few weeks -- because the detection mechanisms that work tend to work within a predictable timeframe. But a meaningful fraction of intrusions persist for months or years. Those long-tail cases drag the mean upward in a way that misrepresents the typical experience.

Consider a concrete example. An organization detects ten incidents in a year with the following dwell times in days: 3, 5, 7, 8, 12, 14, 18, 21, 45, 340. The mean is 47.3 days. The median is 13 days. Which number better represents the organization's detection capability?

The mean says the organization takes about seven weeks to detect intrusions. The median says about two weeks. Neither is wrong, but the mean is dominated by a single incident -- the 340-day outlier -- that probably reflects a fundamentally different class of intrusion (an APT group with custom tooling, a compromised service account that generated no anomalous behavior, a supply-chain implant that bypassed perimeter controls entirely). That incident does not tell you anything about whether your detection rules are working. It tells you that a particular class of threat evaded your detection architecture entirely.

The median is more resistant to outliers and more representative of what your detection program does for the bulk of threats it encounters. If you must report a single number, report the median. Better yet, report the median, the 90th percentile, and the count. The 90th percentile captures the long tail without letting a single extreme value define the metric. The count tells you the sample size, which determines whether any of these numbers mean anything at all.

Small Samples Destroy Statistical Meaning

Most organizations do not detect enough incidents per year to produce statistically meaningful dwell time metrics. If you detect four intrusions in a year, your median dwell time is the average of the second and third values in a sorted list of four numbers. Any conclusion drawn from that figure is indistinguishable from noise.

This is not a theoretical concern. The majority of organizations outside the Fortune 500 and large government agencies detect single-digit intrusions annually. Their dwell time numbers have confidence intervals so wide that they are functionally meaningless for year-over-year comparison. Reporting a 15% improvement in MTTD when your sample size went from six incidents to five is not an improvement. It is arithmetic coincidence.

Survivorship Bias: The Data You Are Not Seeing

The deepest problem with dwell time as a metric is not statistical. It is structural. You can only measure dwell time for intrusions you eventually detected.

Every dwell time dataset is a survivorship-biased sample. The intrusions that were never detected have infinite dwell time, and they do not appear in the data at all. Your metric captures the performance of your detection program against the subset of threats it was capable of seeing. It tells you nothing about the threats it missed entirely.

Fig. 02 -- Survivorship bias in dwell time data

Dwell time can only be calculated for detected intrusions. The threats your program cannot see have infinite dwell time and zero representation in your metrics. A decreasing median dwell time could mean better detection, or it could mean you are only catching the easy ones faster.

This creates a perverse incentive structure. If your detection program improves at catching commodity threats -- phishing footholds, known malware, scripted attacks -- your dwell time goes down. You report improvement. But if sophisticated threats continue to evade your detection entirely, your actual risk posture has not changed. The metric improved because you got better at finding things that were already relatively findable.

The analogy is a hospital reporting that its average length of stay decreased. If that happened because the hospital stopped admitting the sickest patients, the metric improved while the outcomes got worse. Dwell time has the same structural vulnerability. The denominator is not all intrusions. It is all intrusions you noticed.

How Attackers Game Detection Timelines

Sophisticated threat actors are aware of detection capabilities and timelines. They structure their operations to exploit the detection gaps that dwell time metrics obscure.

Living Off the Land

The most effective technique for extending dwell time is to avoid deploying tools that detection can identify. Attackers who restrict themselves to native operating system utilities -- PowerShell, WMI, legitimate remote administration tools, built-in credential stores -- generate activity that is difficult to distinguish from normal administrative behavior. Your EDR may have a signature for Mimikatz. It probably does not have a reliable detection for a domain administrator using PowerShell remoting, because that looks identical to what your own admins do every day.

Living-off-the-land attacks extend dwell time by exploiting the gap between signature-based detection and behavioral detection. Most organizations have deployed the former extensively and the latter partially at best. The attacker knows this.

Slow and Low Operations

Rate-based detection assumes a threshold of activity. If an attacker exfiltrates data at a rate that falls below your threshold, or performs lateral movement across weeks rather than hours, the individual events may each be below the noise floor. Aggregation over time could catch this pattern, but aggregation requires retention, correlation, and baselines that most organizations do not maintain at sufficient granularity.

Deliberate Detection Sacrifice

Some threat actors deliberately trigger detectable activity as a diversion. They deploy noisy malware on a non-critical system, knowing your SOC will find it, declare an incident, contain it, and close the ticket. Meanwhile, their actual foothold on a more valuable system -- established earlier, through a different vector -- continues operating undetected. Your dwell time for the decoy incident looks excellent. The dwell time for the real intrusion is unmeasured.

Key concept: Detection as an adversarial game Dwell time treats detection as a measurement problem -- how fast can you find what is there. In practice, detection is an adversarial problem -- the thing you are trying to find is actively trying not to be found. Any metric that does not account for adversarial adaptation will degrade over time, even if your capabilities stay constant or improve. The attacker is not a static target.

What Dwell Time Actually Measures

Strip away the aspirational framing and ask what dwell time tells you concretely.

Dwell time measures the latency of your fastest detection mechanism for the subset of threats that trigger any detection mechanism at all. That is a useful thing to know. It is not a measure of your security posture, your detection coverage, your risk level, or your team's competence.

When dwell time decreases, one or more of these things may have happened:

Your fastest detections got faster. This is the best-case interpretation and it is sometimes true. You deployed a new detection rule that catches a common initial-access technique earlier in the kill chain. Good.

You started detecting more easy things. Your sample shifted toward incidents with short natural dwell times (ransomware that self-reveals, commodity malware that phones home immediately), and away from incidents with long dwell times (APTs, insider threats). Your capability did not change. Your sample did.

External notification decreased. If a significant fraction of your detections historically came from law enforcement or third parties notifying you months after compromise, and that fraction decreased, your self-detected subset has shorter dwell times by definition. This might mean you are detecting things that previously required external notification. It might mean the external notifiers stopped calling.

Attacker behavior shifted. If threat actors moved from long-dwell espionage to short-dwell ransomware, your dwell time goes down even if your detection did not improve. The attacks got faster, not your detection.

Operationally Useful Alternatives

If dwell time is insufficient on its own, what should you measure instead? The answer is not a different single metric. It is a small set of metrics that measure different aspects of detection capability and that together give you a more honest picture.

Detection Coverage Ratio

Map your detection rules and capabilities to a framework like MITRE ATT&CK. For each technique, classify your coverage as: detected (you have a rule or hunt that reliably identifies this technique), partially detected (you have coverage for some variants or some environments), or not detected (you have no mechanism to identify this technique).

Detection coverage ratio is the fraction of relevant techniques that are detected or partially detected. Unlike dwell time, this is a leading indicator. You can measure it before an incident occurs. You can improve it by writing detections for uncovered techniques. And critically, it makes visible the gaps that dwell time hides -- the techniques that produce infinite dwell time because they are never detected at all.

Fig. 03 -- Detection coverage vs. dwell time as metrics

Dwell time is a lagging indicator that only captures detected threats. Detection coverage is a leading indicator that makes gaps visible before they are exploited. Neither is sufficient alone, but coverage gives you something to act on before the next incident.

The operational advantage of detection coverage over dwell time is that coverage gives you a work queue. If you have no detection for T1053 (Scheduled Task/Job), you can build one. If your dwell time is 14 days, you cannot act on that number directly -- it does not tell you which detection to improve or which gap to close.

Detection Source Distribution

Track how your incidents are detected and categorize by source: automated alerting (SIEM rule, EDR detection), proactive hunting, internal report (user or IT staff), external notification (law enforcement, vendor, peer organization), or self-revealed (ransomware note, defacement).

The distribution across these categories tells you something dwell time cannot. If 40% of your detections come from external notification, your internal detection capability has significant gaps regardless of what your dwell time says. If 80% come from EDR and 0% from network detection, you know where your investment has concentrated and where it has not.

The goal over time is to shift the distribution toward automated detection and proactive hunting, and away from external notification and self-revelation. That shift represents genuine improvement in detection capability, not just faster measurement of the same subset.

Alert-to-Investigation Latency

This is the time between an alert firing and an analyst beginning investigation. It is distinct from dwell time because it starts at the alert, not at the compromise. But it is operationally more actionable because it measures a process you control directly: SOC triage throughput.

If alerts sit in a queue for 12 hours before investigation, your dwell time includes 12 hours of avoidable delay. Measuring and reducing alert-to-investigation latency has a direct, predictable effect on your effective detection speed. Unlike dwell time, this metric is high-frequency (you have many alerts per day), statistically meaningful at small time scales, and responsive to process changes.

False Positive Rate by Detection Rule

A detection rule with an 85% false positive rate does not effectively reduce dwell time. Analysts learn to deprioritize it, or the volume overwhelms triage capacity and legitimate alerts get buried. Tracking false positive rates per rule identifies the detections that are consuming analyst time without producing confirmed incidents.

Reducing false positives is not glamorous work. It does not show up in dwell time figures. But it directly improves the probability that a true positive gets investigated quickly, which is the operational outcome dwell time is supposed to proxy for.

The Board Deck Problem

Much of the persistence of dwell time as the primary detection metric comes from its appeal as a reporting number. It is a single figure. It trends over time. It can be benchmarked against industry reports. Boards and executives understand "we went from 21 days to 14 days" in a way they do not understand "we increased our ATT&CK technique coverage from 47% to 62% across the initial access and lateral movement tactics."

This is a communication problem, not a measurement problem. The solution is not to stop measuring dwell time. It is to contextualize it honestly and supplement it with metrics that drive operational improvement.

A board-level security report should include dwell time with appropriate caveats: the sample size, the median rather than the mean, the fraction detected internally versus externally, and an explicit acknowledgment that the number excludes undetected intrusions. It should also include detection coverage as a forward-looking measure of capability, and detection source distribution as a measure of detection maturity.

If the board only wants one number, give them detection coverage. It is the only metric in this set that tells you something about your preparedness for the next incident rather than your performance in the last one.

Key concept: Leading vs. lagging indicators in security A lagging indicator measures past performance -- how you did. A leading indicator measures current capability -- how prepared you are. Dwell time is a lagging indicator. Detection coverage is a leading indicator. Security programs that report only lagging indicators are always looking backward. Programs that track leading indicators can identify and close gaps before they are exploited. The distinction applies across security: patching cadence (leading) vs. vulnerability exploitation rate (lagging), phishing training completion (leading) vs. successful phish rate (lagging).

What Good Looks Like

A mature detection program does not have a single headline metric. It has an instrumented feedback loop.

Detection coverage tells you what you can see. Detection source distribution tells you how you are seeing it. Alert-to-investigation latency tells you how quickly you act on what you see. False positive rates tell you how much of your capacity is consumed by noise. And yes, dwell time -- properly calculated, with median and percentiles, with adequate sample sizes, with internal and external detection separated -- tells you the end-to-end latency of your detection chain for the incidents you caught.

None of these metrics is useful in isolation. All of them together give you a system that identifies where your detection program is weak, where it is improving, and where to invest next. That is what a security metric is supposed to do: change behavior. A number that goes on a slide and does not change anyone's decision is decoration, not measurement.

Dwell time is not a bad metric. It is an incomplete metric that has been promoted beyond its explanatory power. It measures detection latency for detected threats, which is useful but narrow. If you want to understand your detection posture, measure what you can see (coverage), how you see it (source distribution), how fast you act (triage latency), and how much noise you tolerate (false positive rates). Dwell time becomes one data point in a system, not the system itself.

The organizations that take detection seriously do not celebrate when their dwell time number goes down. They ask why it went down and whether the answer reflects genuine improvement or a shift in the threat mix. That question -- why did this metric change -- is harder to answer than the metric itself. It is also the only question that matters.

Dwell Time as a Security Metric: What It Tells You and What It Hides

What Dwell Time Actually Is

Why the Mean Is the Wrong Central Tendency

Small Samples Destroy Statistical Meaning

Survivorship Bias: The Data You Are Not Seeing

How Attackers Game Detection Timelines

Living Off the Land

Slow and Low Operations

Deliberate Detection Sacrifice

What Dwell Time Actually Measures

Operationally Useful Alternatives

Detection Coverage Ratio

Detection Source Distribution

Alert-to-Investigation Latency

False Positive Rate by Detection Rule

The Board Deck Problem

What Good Looks Like