1. IR Framework Overview

NIST SP 800-61 Framework

NIST SP 800-61 (Computer Security Incident Handling Guide) is the most widely adopted IR framework. It defines a lifecycle approach to incident response with clear phases and feedback loops.

  • Preparation: Build the team, write the playbooks, deploy the tools, conduct the exercises. Preparation is 80% of IR effectiveness — you cannot prepare during an incident. Includes deploying SIEM, EDR, training responders, and establishing communication plans.
  • Detection & Analysis: Identify potential incidents from alerts, user reports, or threat hunting. Triage to confirm a real incident, classify severity, and scope the impact. Establish the incident timeline.
  • Containment: Short-term containment to stop immediate damage while preserving evidence. Long-term containment to stabilize the environment for extended investigation. Isolation decisions balanced against business impact.
  • Eradication: Remove the threat from the environment — malware, attacker persistence, compromised credentials. Validate removal completely before proceeding.
  • Recovery: Restore systems to normal operation with enhanced monitoring. Validate full restoration. Confirm the threat has not returned.
  • Post-Incident Activity: Document the incident, conduct blameless post-mortem, implement lessons learned, update playbooks and detections. The improvement loop that makes each incident the last of its type.

SANS PICERL & CISA Guidance

SANS PICERL is a teaching-oriented framework widely used in security training. CISA provides additional federal IR guidance that extends NIST 800-61 for critical infrastructure operators.

  • P — Preparation: Same as NIST preparation phase — team, tools, plans, exercises
  • I — Identification: Determine whether an event is an actual incident. Gather initial indicators, validate, and declare an incident with initial severity classification.
  • C — Containment: Limit damage scope — short-term (immediate isolation) then long-term (sustained stable environment for investigation)
  • E — Eradication: Remove root cause — malware, unauthorized accounts, misconfigurations that enabled the attack
  • R — Recovery: Restore and validate systems. Monitor closely for recurrence during initial recovery period.
  • L — Lessons Learned: Post-incident review documenting what happened, why, and what changes prevent recurrence
  • CISA provides free IR resources including incident reporting forms, playbooks, and the Known Exploited Vulnerabilities catalog that guides prioritization
PhaseKey ActivitiesKey ArtifactsPrimary Stakeholders
PreparationTeam assembly, tool deployment, playbook authoring, tabletop exercises, retainer engagementIR policy, RACI chart, contact list, playbooks, asset inventoryCISO, IR Team, IT, Legal
Detection & AnalysisAlert triage, IOC analysis, scope assessment, severity classification, incident declarationIncident ticket, timeline, IOC list, affected asset listSOC, IR Lead, Threat Intelligence
ContainmentNetwork isolation, account disablement, C2 blocking, evidence preservationContainment actions log, network capture, system imagesIR Team, Network, System Admins
EradicationMalware removal, persistence cleanup, credential rotation, patch deploymentEradication checklist, clean-bill-of-health scan resultsIR Team, System Admins, IT Ops
RecoverySystem restoration, service validation, enhanced monitoring deploymentRecovery checklist, test results, monitoring planIT Ops, Business Units, Exec
Post-IncidentTimeline documentation, RCA, lessons learned, control updates, threat intel sharingPIR report, action item tracker, updated playbooksAll stakeholders

2. IR Team Structure

CSIRT vs SOC

CSIRT (Computer Security Incident Response Team) and SOC (Security Operations Center) are complementary but distinct functions that are often confused.

  • SOC: Ongoing continuous monitoring function. Analyzes alerts, tunes detection rules, monitors dashboards 24/7. The SOC is always running, regardless of whether an incident is active. Staffed in shifts.
  • CSIRT: Activated when an incident is declared. Coordinates response activities across teams. May have permanent staff (dedicated CSIRT) or be a virtual team assembled from existing personnel (virtual CSIRT). Not running continuously unless an incident is in progress.
  • Small organizations: SOC and CSIRT are the same team wearing different hats. The on-call analyst handles initial triage (SOC function) and then shifts to response coordination (CSIRT function) when an incident is confirmed.
  • Large enterprises: dedicated SOC (24/7 monitoring) hands off to dedicated CSIRT (incident coordination) when severity threshold is met. Both teams have clear handoff procedures.
  • MSSP (Managed Security Service Provider): many organizations outsource SOC function or augment with an MSSP for 24/7 coverage. The CSIRT remains internal for coordination and decision-making.

IR Roles & Responsibilities

Clear role definitions prevent confusion during incidents. Every role should have a documented responsibility, a named primary, and a named backup. Tested during exercises.

  • IR Lead / Incident Commander: Coordinates all response activities, maintains the incident timeline, calls key decisions, runs the bridge call. The single point of coordination — not necessarily the most technical person.
  • Threat Analyst: Performs technical triage and analysis — examines logs, runs IOC queries, identifies attack TTPs, determines scope and impact. Feeds findings to IR Lead.
  • Forensics Lead: Preserves evidence with chain of custody, performs memory and disk forensics, reconstructs attack timeline from artifacts. Must be engaged before evidence is destroyed by containment actions.
  • Communications Lead: Manages internal stakeholder updates, drafts customer notifications, coordinates with PR. Ensures consistent messaging and tracks notification deadlines.
  • Legal / Compliance: Advises on notification obligations (GDPR 72hr, HIPAA, SEC), attorney-client privilege for IR communications, law enforcement engagement decisions, regulatory notification timing.
  • Management Liaison: Translates technical findings for executives, ensures business impact decisions have appropriate sign-off, escalates when scope or impact crosses thresholds requiring executive decision.

On-Call & Retainer Planning

Incidents don't respect business hours. A significant percentage of ransomware deployments occur on Friday afternoons and holiday weekends — when staff are least available and most fatigued.

  • 24/7 on-call rotation for core IR capability: at minimum, an analyst who can triage and escalate, and an IR Lead who can be reached for incident declaration decisions
  • PagerDuty/OpsGenie rotation: clear escalation paths with tested contact numbers. Test the escalation path quarterly — don't discover it's broken during a P1 incident.
  • IR Retainer: Pre-negotiated engagement with an external IR firm (Mandiant, CrowdStrike Services, Palo Alto Unit 42, Secureworks). Resources available on short notice; rate card pre-agreed. Mobilization faster than procurement during an active incident.
  • Retainer includes: 24/7 hotline, designated IR lead, forensic platform access (CrowdStrike Falcon Complete, Mandiant Advantage), legal coverage for IR communications
  • Annual retainer review: test the relationship with a tabletop exercise involving the retainer firm. Discover integration issues before you need them.

3. Incident Classification & Triage

Incident Categories

Categorizing incidents by type enables routing to appropriate playbooks and expertise. Each category has different containment approaches, regulatory implications, and business impact profiles.

  • Ransomware: Mass encryption of systems with extortion demand. Highest business impact. Requires immediate network isolation, backup assessment, legal/PR engagement. May trigger regulatory notification.
  • Data Breach: Confirmed or suspected unauthorized access to sensitive data (PII, PHI, financial data, trade secrets). Requires immediate legal and compliance notification. Regulatory timelines begin from discovery.
  • Insider Threat: Malicious or negligent action by an employee or contractor. Requires HR and Legal involvement from the start. Special evidence preservation procedures to support potential employment action or prosecution.
  • Account Compromise: Unauthorized access to user accounts — often via phishing, credential stuffing, or MFA bypass. Assess scope of account access, contain by disabling/resetting credentials, investigate scope of data access.
  • Malware Infection: Malicious software on endpoints or servers — not yet confirmed as full breach. Scope and contain quickly before it escalates to ransomware deployment or data theft.
  • DDoS: Volumetric or application-layer attack disrupting availability. Different response track — engagement with upstream provider, CDN/WAF mitigation, law enforcement coordination for attribution.
  • Business Email Compromise (BEC): Executive impersonation or account compromise to initiate fraudulent financial transfers. Immediate bank notification (wire recall possible within 24-72 hours), legal, and law enforcement involvement.

Regulatory Notification Triggers

Several regulatory frameworks impose strict notification deadlines that begin running from the moment an incident is discovered. These timelines must be tracked from incident declaration.

  • GDPR Article 33: Personal data breach notification to supervisory authority within 72 hours of discovery. Late notification requires explanation. Notification to affected individuals if high risk to their rights (Article 34).
  • HIPAA Breach Rule: Notification to affected individuals within 60 days of discovery. HHS notification within 60 days. Media notification in states with 500+ affected residents. Covered entity must notify even if breach was at a Business Associate.
  • SEC Rule (2023): Public companies must file Form 8-K within 4 business days of determining a cybersecurity incident is "material." Materiality determination requires legal analysis — engage counsel immediately when a breach involves public company assets.
  • State Breach Notification Laws: 50 US states have varying requirements — typically 30-90 days, some require notification to state AG. California (CCPA), New York (SHIELD), Texas (TISB), and others have specific requirements for covered data types.
  • Legal hold: once litigation is reasonably foreseeable (breach involving personal data almost always triggers this), halt all data deletion and preserve all potentially relevant evidence
SeverityCriteriaResponse SLAWho to NotifyExample
P1 CriticalActive breach, ransomware deploying, data actively exfiltrating, production down15 min initial response; continuous engagementCISO, CTO, CEO, Legal, Board (if material), IR RetainerRansomware at 3am, active C2 on critical server
P2 HighSuspected breach, attacker dwell confirmed, sensitive data access detected1 hour initial response; 4-hour full team assemblyCISO, IR Lead, Legal, relevant business unit headConfirmed lateral movement, PHI query anomaly
P3 MediumIsolated malware, single account compromise, policy violation without data exposure4 hour initial response; next-business-day resolution planIR Lead, system owner, user's managerMalware on isolated workstation, credential stuffing attempt
P4 LowSuspicious activity not yet confirmed as incident, policy violation, phishing reportNext business day; weekly SLA for resolutionSOC analyst, ticket queuePhishing email reported, failed MFA from new location

4. Playbooks & Runbooks

What Makes a Good Playbook

A good IR playbook is specific enough to be actionable under stress, role-aware enough to direct the right person, and tested enough to be trusted.

  • Specific Decision Trees: Not "investigate the alert" but "if EDR shows process injection from svchost, check event 4688 for parent-child relationship, then check VirusTotal hash, then if positive: isolate host using these exact commands"
  • Role-Aware: The Tier 1 SOC analyst playbook is different from the IR Lead playbook for the same incident type. Each person knows exactly what they should do without needing to coordinate for every step.
  • Tested: A playbook that hasn't been run in a tabletop is untested theory. Quarterly exercises with specific playbooks reveal gaps, ambiguities, and outdated procedures before they matter.
  • SOAR-Compatible: Design playbook steps with automation in mind. Steps that can be automated should be — IOC enrichment, ticket creation, notification sending. Human decision points clearly marked.
  • Living Documents: Update playbooks after every real incident and every exercise. A playbook that hasn't been updated in 12 months is likely outdated in significant ways.
  • Include specific command examples, tool links, contact numbers, escalation thresholds — remove ambiguity entirely

Critical Playbook Scenarios

Every organization needs these core playbooks at minimum. Each addresses a distinct threat category with different response priorities and notification requirements.

  • Ransomware: 1) Detect early indicators (mass file modifications, Volume Shadow Copy deletion) 2) Network isolate affected systems immediately 3) Preserve evidence before isolation 4) Assess backup integrity 5) Notify CISO/Legal/Board 6) Engage IR retainer 7) Do not pay ransom without legal/executive sign-off
  • Phishing Investigation: 1) Analyze email headers and URLs 2) Detonate URL/attachment in sandbox 3) Identify all recipients 4) Check if any users clicked 5) Reset credentials for affected users 6) Block IOCs in email security and proxy 7) Search for lateral movement
  • Data Breach: 1) Confirm unauthorized access occurred 2) Scope what data was accessed/exfiltrated 3) Immediately notify Legal — regulatory clock starts 4) Begin GDPR/HIPAA/state law analysis 5) Prepare notification letters 6) Engage forensics for evidence preservation
  • Business Email Compromise: 1) Identify fraudulent wire instructions 2) Immediately contact bank — wire recall possible within 24-72hr 3) Secure compromised account 4) Notify FBI IC3 and financial crimes division 5) Internal finance controls review

5. Tabletop & IR Exercises

Tabletop Exercise Design

A tabletop exercise is a discussion-based simulation where participants walk through a scenario and describe their response. No actual systems are involved. The goal is to test people, process, and plans — not technology.

  • Facilitator: Guides the discussion, presents scenario injects, manages time, ensures all roles participate, captures gaps and observations. Can be internal or external (external facilitators often produce better results — less deference to senior leaders).
  • Scenario Inject Cards: Pre-written scenario developments that advance the scenario over time. "It's now 3am. The on-call engineer has not responded to pages. The CEO is calling you directly. What do you do?"
  • Discussion-Based: Participants describe what they would do, not actually do it. Facilitator asks probing questions: "Who would make that call?", "How long would that actually take?", "What would you need from IT to do that?"
  • CISA Free Resources: CISA provides free tabletop exercise packages including scenario scripts, inject cards, discussion guides, and after-action templates. Available at cisa.gov/resources-tools/services/cisa-tabletop-exercise-packages
  • Duration: half-day for focused scenarios; full-day for multi-phase incidents. Longer exercises test stamina and decision-making under fatigue — realistic for major incidents.

Recommended Tabletop Scenarios

Scenario selection should reflect your organization's actual threat landscape — industry-specific threats, likely attack vectors, and highest-impact incident types.

  • Ransomware at 3am Friday: IT monitoring detects mass file encryption on file servers. Key staff are unavailable. How do you make decisions? Who has authority to take systems offline?
  • CFO Email Compromise: Finance team receives wire instructions from "CFO" email. Wire is completed before fraud is detected. How do you respond? What's the bank recall process?
  • Cloud Data Exposure: Security researcher emails to report an open S3 bucket with customer PII. Is GDPR notification required? How do you determine what was accessed?
  • Insider Threat: Departing employee in their final week is observed bulk-downloading files to personal cloud storage. Legal hold? Evidence preservation? When do you involve HR? Law enforcement?
  • Executive Crisis Simulation: Designed specifically for C-suite and board members. Focus on decision-making under pressure, public communications, regulatory engagement, and business continuity decisions.
  • Purple Team Exercise: Red team simulates a realistic attack (with rules of engagement); blue team responds. Collaborative — red team can share TTPs in real time to improve detections. Most realistic test of detection and response capabilities.

Write the Plan Now — Not During the Incident

An incident is the worst time to write an incident response plan. Under stress, with systems down, executives demanding answers, and regulatory clocks ticking, organizations that improvise their response make worse decisions, take longer, miss evidence, and pay higher ransomware ransoms. The organizations that recover fastest from major incidents are those whose responders knew their roles, had tested their playbooks, had maintained their contact lists, and had verified their backup integrity — all before the incident began. Test your IR plan quarterly with tabletops. Update it annually. The cost is one afternoon; the return is measured in millions of dollars of avoided breach costs.