1. Containment Decision Framework
Short-Term vs Long-Term Containment
Containment has two distinct phases with different goals. Rushing to long-term containment without adequate short-term stabilization causes evidence loss and missed lateral movement.
- Short-Term Containment: Immediate actions to stop the bleeding — isolate the affected system, block known C2 infrastructure, disable compromised accounts. Goal: stop active damage without destroying evidence or alerting the attacker to your detection.
- Long-Term Containment: Sustained measures that allow extended investigation while the attacker is neutralized — VLAN isolation, enhanced monitoring, temporary compensating controls. The environment is stable enough to investigate without rushing to eradication.
- The gap between short and long-term containment is investigation time — you need the attacker stable but not yet evicted so you can fully understand scope before cleaning up.
- Critical decision point: has the attacker been alerted to your detection? If yes, move faster — they may trigger destructive payload. If no, you have more time for thorough investigation before eradication.
Key Containment Tradeoffs
Every containment action involves tradeoffs. The IR team must consciously evaluate these rather than defaulting to maximum aggression or maximum caution.
- Business Impact vs Continued Attacker Access: Taking a critical production system offline may stop attacker access but cost the organization $100K/hour in lost revenue. Is that acceptable given the current threat level? This is an executive decision, not a technical one.
- Evidence Preservation vs Stopping Damage: Performing memory forensics before isolation takes 30-60 minutes during which the attacker may continue activity. Sometimes evidence must be sacrificed for immediate protection.
- Data Exfiltration in Progress?: If the attacker is actively exfiltrating data, immediate blocking of the exfiltration channel may be the priority over full scope analysis. Data leaving the network is immediate, irreversible harm.
- Lateral Movement Scope?: If you don't know the full scope of lateral movement, isolating one system may cause the attacker to trigger actions-on-objective from their remaining footholds. Understand the scope before containment if possible.
- Ransomware Deployment Stage?: If ransomware has been pre-positioned but not yet executed, containment before execution is critical. If encryption is already in progress, focus shifts to limiting spread — network segmentation immediately.
| Containment Action | Evidence Impact | Business Impact | Attacker Alert Risk | When to Use |
|---|---|---|---|---|
| Block C2 at firewall/proxy | Low (preserves endpoint) | Low | High (attacker may trigger payload) | After evidence capture; known C2 with no active exfiltration |
| DNS sinkhole of C2 domains | Low | Very Low | Medium | Silent C2 disruption while investigation continues |
| VLAN isolation of compromised host | Low (host still running) | Medium | Medium | After memory capture; scope unclear; investigation ongoing |
| EDR host isolation | Low (EDR still connected) | High (host fully isolated) | Medium | Active credential theft or lateral movement from host |
| Account disablement | None | Medium (user/service disrupted) | High | Compromised credential confirmed; revoke immediately |
| Full shutdown | High (memory lost) | Very High | N/A | Only if no other option; ransomware actively encrypting |
2. Network Containment
VLAN Isolation & Firewall ACLs
Network-level containment can be applied quickly and is often the first containment action — it limits attacker movement and exfiltration without requiring access to individual compromised endpoints.
- VLAN Isolation: Move the compromised host's switchport to a quarantine VLAN. The VLAN allows only IR team VPN access for forensics — no production network access, no internet access (except through a controlled proxy for C2 observation if desired).
- Firewall ACL Blocks: Block known C2 IP addresses and domains at the perimeter firewall and any internal segment firewalls. Block specific attacker-used ports (common: 4444, 8080, 1337 for reverse shells).
- Block Lateral Movement Paths: If attacker used specific protocols for lateral movement (SMB port 445, WMI port 135, RDP port 3389), restrict these between segments to limit further spread while investigation continues.
- Cloud Network Containment: AWS Security Groups — add deny-all rules or emergency SG. Azure NSG — add high-priority deny rules. GCP VPC Firewall — add deny rules. All cloud providers allow in-place security group modification without instance restart.
- Maintain IR team access path: ensure quarantine VLAN has IR team connectivity via out-of-band management (iDRAC, iLO) or dedicated IR VPN segment.
DNS Sinkhole & Network Forensics
DNS sinkholes are a particularly valuable containment technique — they silently disrupt C2 communications while providing intelligence about infected hosts querying the sinkhled domain.
- DNS Sinkhole: Configure internal DNS resolvers to return a controlled IP address for known C2 domains. Infected hosts continue beaconing but traffic goes to your sinkhole server, not the attacker. Sinkhole logs reveal all infected hosts.
- BGP Null Route: For volumetric C2 or DDoS sources, work with upstream ISP to black-hole traffic to/from known malicious IP ranges via BGP community. Effective for large IP blocks.
- Network Traffic Capture BEFORE Blocking: SPAN port or network TAP to capture all traffic from the compromised host/segment before blocking C2 or isolating. That traffic contains forensic evidence — what data was exfiltrated, what commands were received, what lateral movement occurred.
- PCAP for Forensics: tcpdump or Wireshark capture on the SPAN port. Retain at minimum 24 hours of PCAP before containment. Extract protocol information, files transferred, and any plaintext credentials visible in network traffic.
- The most common forensic regret in IR engagements: "we blocked C2 before capturing traffic and now we have no idea what data left the network"
Capture Network Traffic Before Blocking C2
One of the most common IR mistakes is blocking the attacker's C2 channel before capturing the network traffic. That traffic contains critical forensic evidence: what data was exfiltrated and to where, what commands the attacker issued, what lateral movement paths were used, and what tools were downloaded. Additionally, abrupt C2 blocking can trigger automated payload execution — some ransomware operators configure "dead man's switch" behaviors where a missed C2 beacon triggers immediate encryption. Capture first, then block. The 30-minute delay for a PCAP setup is almost always worth the forensic value recovered.
3. Endpoint Containment
EDR Host Isolation
Modern EDR platforms can isolate endpoints with a single API call or UI click, cutting all network connectivity except the EDR management channel. This enables continued forensic investigation while blocking attacker access.
- CrowdStrike Network Containment: Cuts all network connectivity except Falcon sensor communication. IR team can still run RTR (Real-Time Response) commands on the isolated host. Containment/un-containment logged in audit trail.
- Microsoft Defender for Endpoint Isolate Device: Full isolation (all network except MDE), or selective isolation (maintaining specific IP communications). Managed from Microsoft 365 Defender portal or API.
- SentinelOne Disconnect from Network: Similar isolation with agent communication retained. Can also be scripted via Sentinel API for automated response playbooks.
- Isolation is reversible — unlike shutdown, the system stays running, memory is preserved, and EDR telemetry continues to flow. This enables ongoing investigation while the attacker is locked out.
- Document every isolation: timestamp, who ordered it, reason, system state at time of isolation. Chain of custody starts here.
Memory Acquisition
Memory acquisition must happen before EDR isolation or system shutdown. Once power is cut, memory contents are gone — along with encryption keys, running malware, injected code, and LSASS credentials.
- Winpmem (Windows): Open-source memory acquisition tool for Windows. Outputs a raw .raw or .aff4 memory image. Run as Administrator.
winpmem_mini_x64_rc2.exe output.raw - LiME (Linux Memory Extractor): Loadable kernel module that acquires Linux memory.
insmod lime.ko "path=/mnt/usb/memory.lime format=lime". Output over network to avoid writing to suspect disk:path=tcp:4444. - VMware Snapshot: For VMs, a snapshot captures memory state at that moment. Download the .vmem file alongside the snapshot for memory analysis. No agent required.
- Memory image hash immediately after acquisition:
sha256sum memory.raw > memory.raw.sha256. Every subsequent analysis tool should verify this hash to confirm integrity for chain of custody. - Memory acquisition timing: collect as soon as possible after detection. Every minute that passes may have the malware performing further actions, overwriting memory, or exfiltrating data.
Order of Evidence Collection (Volatile First)
The order of evidence collection follows volatility — most volatile (disappears fastest) collected first. This is a fundamental digital forensics principle.
- 1. RAM / Memory: Disappears on shutdown. Contains running processes, network connections, encryption keys, LSASS credentials, injected code, decrypted content. Must be first.
- 2. Running Processes: Capture process list with PID, parent PID, command line, open files, network connections.
tasklist /v,wmic process list full,ps auxf - 3. Network Connections: Active TCP/UDP connections and listening ports.
netstat -anob(Windows),ss -tulnp(Linux) - 4. Logged-in Users & Sessions:
quser,query session(Windows),w,who,last(Linux) - 5. Disk Image: FTK Imager (Windows), Guymager (Linux GUI), or
dd if=/dev/sda of=/mnt/usb/disk.img bs=4M status=progress. Hash before and after. - 6. System Logs: Windows Event Logs (evtx), Linux /var/log/. Export before system manipulation.
4. Eradication
Removing Malware & Persistence
Eradication requires systematically identifying and removing every piece of attacker infrastructure — malware, persistence mechanisms, backdoor accounts, and modified configurations. Missing one means re-infection.
- Scheduled Tasks (Windows):
schtasks /query /fo LIST /v— review all tasks; remove attacker-created tasks. Common attacker path: scheduled task running base64-encoded PowerShell from %APPDATA%. - Registry Run Keys: HKCU\Software\Microsoft\Windows\CurrentVersion\Run, HKLM\Software\Microsoft\Windows\CurrentVersion\Run, HKLM\SYSTEM\CurrentControlSet\Services — review all entries, remove attacker-created persistence.
- Services: sc query type= all state= all — review all services; remove attacker-installed services. Event 7045 (new service installed) helps identify them.
- WMI Subscriptions:
Get-WMIObject -Namespace root\subscription -Class __EventFilter— WMI event subscriptions are a favored persistence mechanism because they survive reboots and are invisible to most endpoint scans. - Cron Jobs (Linux): Check /etc/crontab, /etc/cron.d/, /var/spool/cron/crontabs/, and /etc/cron.hourly|daily|weekly|monthly/ for attacker-added entries.
- SSH Authorized Keys: Check /home/*/.ssh/authorized_keys and /root/.ssh/authorized_keys for attacker-added public keys.
Credential Rotation
Every credential that could have been stolen must be rotated. Incomplete credential rotation is the most common cause of re-infection from the same attacker.
- All accounts that authenticated to affected systems: Pull authentication logs for affected systems. Every account that successfully authenticated during the attacker's dwell period may have been captured from LSASS.
- Service accounts: Service accounts running on affected systems, or used for connections to affected systems. These are often reused across many systems — attacker-obtained service account credentials may enable broad access.
- API keys and secrets: Any secrets stored on affected systems (config files, environment variables, registry) must be considered compromised and rotated.
- Kerberos TGT ticket TTL: For AD attacks involving Golden/Silver tickets, change the krbtgt account password twice (twice because of Kerberos replication) to invalidate all existing Kerberos tickets.
- Rotate in order of privilege: domain admin accounts first, service accounts second, user accounts third. Document every rotation with timestamp and confirming party.
- SSO Sessions: Revoke all active SSO sessions for affected users — not just password reset. An attacker with a valid session token can continue access even after password change.
| Persistence Location | OS | Check Command | Attacker Usage |
|---|---|---|---|
| Registry Run Keys (HKCU/HKLM) | Windows | reg query HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run | Launch malware at user logon or system startup |
| Scheduled Tasks | Windows | schtasks /query /fo LIST /v | Timed execution, sometimes obfuscated as system tasks |
| WMI Event Subscriptions | Windows | PowerShell: Get-WMIObject -NS root\subscription -Class __EventFilter | Fileless persistence; survives reboots; hard to detect |
| Windows Services | Windows | sc query type= all / Event 7045 | Persistent backdoor service, often renamed to blend in |
| Cron Jobs | Linux | crontab -l; cat /etc/cron* | Scheduled malware re-download, beacon, or data theft |
| SSH Authorized Keys | Linux/Unix | cat /root/.ssh/authorized_keys; cat /home/*/.ssh/authorized_keys | Persistent SSH backdoor key for password-free access |
| Systemd Unit Files | Linux | systemctl list-units --type=service --state=running | Malicious service that starts at boot as legitimate-looking unit |
| .bashrc / .profile | Linux | cat /home/*/.bashrc /home/*/.profile /root/.bashrc | Commands executed on interactive shell login |
5. Eradication Verification
The Re-infection Problem
Re-infection after IR engagement is alarmingly common — approximately 30% of IR engagements see some form of re-infection or attacker return. The cause is almost always incomplete eradication.
- Common causes: one persistence mechanism missed across a fleet of 500 systems, an unscanned system that had identical access as an affected system, a compromised service account used to re-establish access from an external attacker-controlled system
- Systematic verification requires running IOC sweeps against the entire fleet — not just confirmed affected systems. Attackers spread wider than initial detection suggests.
- IOC Sweep After Cleanup: Re-run all IOC searches (file hashes, registry paths, process names, network connections) against the entire environment 24 hours after eradication. Any hit indicates incomplete cleanup.
- Clean Boot Test: For critical servers, boot from a known-clean bootable medium (Windows PE or Linux live USB) and scan the installed system drive for indicators. Avoids running the potentially-compromised OS during scan.
- Re-image vs Clean: For targeted intrusions by skilled threat actors, re-imaging is strongly preferred over cleaning. Re-imaging guarantees a known-good state; cleaning relies on finding every malicious artifact, which is nearly impossible for advanced implants.
Post-Eradication Monitoring
Even after thorough eradication, enhanced monitoring for 2-4 weeks post-cleanup is essential. Attackers may have left additional implants that activate after the initial heat dies down.
- Deploy enhanced EDR telemetry collection on previously affected systems — turn on verbose process creation logging, network connection logging, and registry change monitoring.
- Monitor for re-appearance of any IOCs or TTPs observed during the incident. Any recurrence is treated as an active incident, not a false positive.
- Watch for "sleeping" implants: malware that checks in only weekly or monthly to avoid detection during intensive post-incident monitoring. Enhanced monitoring must continue for at least 30 days.
- Compare current system state to post-eradication baseline: file integrity monitoring (FIM) alerts on any change to system binaries, scheduled tasks, or services during the monitoring period.
- Behavioral monitoring: watch for the same TTPs (same LOLBins, same lateral movement patterns, same C2 callback timing) even from different source systems — sophisticated actors return through different vectors.
Incomplete Eradication is the Root Cause of Re-infection
The most dangerous moment in IR is declaring eradication complete before it actually is. Organizations pressure IR teams to return systems to production as quickly as possible — and teams under that pressure miss persistence mechanisms. The correct answer to "are we clean?" must be: verify independently, get a second confirmation from a different analyst or a different tool, and then verify again before returning to production. The cost of 24 additional hours of verification is trivially small compared to the cost of a re-infection requiring the entire IR process to restart. Verify, get independent confirmation, then verify again.