1. Detection Architecture
Detection-in-Depth
No single detection layer catches everything. A defense-in-depth detection strategy covers multiple layers, each with different telemetry types and detection strengths.
- Network (NDR/IDS): Detects network-level attacks, lateral movement, C2 callbacks, data exfiltration. Cannot see encrypted payloads but can fingerprint TLS (JA3/JA4), detect beacon patterns, and identify protocol anomalies.
- Endpoint (EDR): Richest telemetry — process creation, file writes, registry changes, network connections, memory operations. Cannot be bypassed by network-only attacks. CrowdStrike, MDE, SentinelOne.
- Cloud (CSPM/CIEM/native detection): AWS GuardDuty, Azure Defender, GCP Security Command Center. Detects IAM anomalies, resource abuse, data exfiltration to external accounts, crypto mining.
- Identity (UEBA/MDI): User and Entity Behavior Analytics detects anomalous authentication patterns, impossible travel, lateral movement paths. Microsoft Defender for Identity, Varonis.
- Application (WAF/RASP): Detects web application attacks (SQLi, XSS, SSRF) at the application layer. RASP (Runtime Application Self-Protection) detects attacks from inside the application execution context.
Alert Fatigue Problem
Alert fatigue is the primary reason security controls fail — not because they don't detect, but because the signal is buried in noise and analysts stop trusting alerts.
- Average SOC receives 1,000-10,000+ alerts per day. Average analyst can thoroughly investigate 20-30 alerts per day. The math is unsustainable without automation and prioritization.
- 75% of alerts in a typical untuned SIEM are false positives — known-good behavior triggering poorly-tuned rules. Analysts learn to ignore these, and then miss real attacks.
- Tuning Strategy: For each alert type, track false positive rate. Suppress known-good (whitelist) before the SIEM — at the source, not after alerting. Target FPR below 20% for any production rule.
- Alert Enrichment: Add context automatically — IP reputation, user risk score, asset criticality, recent related alerts — so analysts can make faster decisions with fewer tool switches.
- Tiered Response: Tier 1 handles automated enrichment and clear-cut dismissals. Tier 2 handles complex investigations. Tier 3 / threat hunters proactively search for what Tier 1/2 missed.
| Data Source | Coverage | Alert Volume | Retention Cost | Primary Detections |
|---|---|---|---|---|
| EDR Telemetry | Endpoint: all process/file/network activity | High (raw) → Low (with AI filtering) | High (full telemetry streaming) | Malware, LOLBins, lateral movement, credential theft |
| Network Flow / NDR | All east-west and north-south traffic | Medium | Medium (NetFlow vs full PCAP) | C2 beacons, data exfil, port scans, lateral movement |
| Cloud API Logs | All API calls to cloud provider | Very High (raw) | Medium-High | Privilege escalation, data access, resource abuse |
| Identity / Auth Logs | Authentication events across all systems | Medium | Low-Medium | Credential attacks, MFA bypass, impossible travel |
| Application Logs | Application-layer activity | Varies widely | Low-High (depends on volume) | Web attacks, injection, abuse of app functions |
| DNS Logs | All DNS queries from managed resolvers | Very High | Low (compact format) | C2 domains, DNS tunneling, DGA domains, exfiltration |
2. SIEM & Log Analysis
SIEM Architecture
A SIEM (Security Information and Event Management) platform collects, normalizes, stores, and correlates security event data from across the organization to enable detection and investigation.
- Collection: Agents (Splunk UF, Elastic Agent), agentless syslog (via rsyslog/syslog-ng), API polling (cloud logs, SaaS), and streaming (Kafka) bring logs into the SIEM pipeline.
- Normalization: Raw logs mapped to a common schema. CEF (Common Event Format), ECS (Elastic Common Schema), or OCSF (Open Cybersecurity Schema Framework) enable consistent querying across sources.
- Detection Rule Types: Threshold (5+ failed logins in 10 minutes), pattern (process A → spawns process B → connects to IP C), and anomaly (baseline deviation — user accessing 10x normal data volume).
- Sigma Rules: Vendor-agnostic detection rule format. Write once, compile to Splunk SPL, Elastic DSL, QRadar AQL, Sentinel KQL, or any supported backend. 3000+ community rules in the SigmaHQ repository cover MITRE ATT&CK techniques.
- MITRE ATT&CK Coverage: Map each detection rule to an ATT&CK technique. Use the ATT&CK Navigator heatmap to visualize detection gaps — blank cells are undetected technique categories.
Commercial & Open Source SIEM
SIEM selection is one of the most consequential security technology decisions. Key factors: data volume and cost, analyst workflow, integration depth, and detection rule ecosystem.
- Splunk Enterprise / Cloud: Most flexible, most powerful, highest cost. SPL (Search Processing Language) enables sophisticated detection logic. Huge ecosystem of apps and detections. Premium pricing by data volume.
- Microsoft Sentinel: Cloud-native SIEM on Azure. KQL (Kusto Query Language) for detection. Deep integration with Microsoft 365, Defender products, and Entra ID. UEBA and threat intelligence built in. Pay-per-GB model.
- IBM QRadar: Strong network-flow integration and offense correlation. Good for high-security environments. On-premises or SaaS. Complex to tune, powerful when properly configured.
- Wazuh (Open Source): SIEM + XDR platform. Agent-based for Windows, Linux, macOS, containers. MITRE ATT&CK mapped rules, FIM, vulnerability detection. Integrates with Elasticsearch/OpenSearch. Free with significant community support.
- OpenSearch Security Analytics: Detection engine built on OpenSearch. SIGMA rule support. Findings mapped to MITRE ATT&CK. Good for teams with Elastic/OpenSearch expertise.
3. Endpoint Detection & Response (EDR)
EDR Telemetry
EDR agents collect comprehensive telemetry from every monitored endpoint, providing the deepest visibility into attacker activity on Windows, macOS, and Linux systems.
- Process Creation: Every process launch with command line, parent process, user context, and SHA-256 hash. The most valuable data source for detecting malicious execution and LOLBin abuse.
- File Write Operations: New executable files, script files, and modifications to sensitive paths (System32, Program Files, startup folders, cron, sudoers).
- Network Connections: Every outbound connection from every process — destination IP, port, bytes transferred. Maps attacker tools to C2 infrastructure.
- Registry Changes (Windows): Persistence mechanisms commonly use registry Run keys, services, and scheduled tasks. EDR captures all registry writes for hunting and detection.
- Memory Regions: Suspicious memory allocations (RWX permissions), process injection (DLL injection, process hollowing, reflective loading). Critical for detecting fileless malware.
- Threat Graph: CrowdStrike's correlated view of all related activity — single alert links to a complete process tree showing the full attack chain from initial access to objective.
EDR Platform Comparison
The top EDR platforms have converged on similar capabilities but differ in AI quality, response features, cloud integration, and analyst experience.
- CrowdStrike Falcon: Market leader by Gartner. AI-powered prevention engine trained on 3 trillion events/week. Real-time response shell (RTR) for remote investigation. Threat Graph for correlated investigation. Highest total cost.
- Microsoft Defender for Endpoint (MDE): Native Windows integration provides deepest OS visibility. Integration with Entra ID, Sentinel, and Defender XDR enables correlated identity + endpoint detection. Best value for Microsoft-heavy organizations.
- SentinelOne: Autonomous response capabilities (isolate, rollback ransomware file changes) without analyst intervention. Strong Linux and macOS coverage. Storyline feature correlates all related events.
- Carbon Black (VMware/Broadcom): Strong in VMware-heavy environments. Reputation-based detection with behavioral analysis. Good for regulated industries with on-premises requirements.
| EDR Product | AI/ML Detection | MITRE Coverage | Remote Response | Linux/Mac | Price Tier |
|---|---|---|---|---|---|
| CrowdStrike Falcon | Best-in-class (Threat Graph) | Excellent (95%+) | Yes (RTR shell) | Yes (strong) | $$$ |
| Microsoft Defender for Endpoint | Very good | Excellent | Yes (Live Response) | Yes (good) | $$ (included in M365 E5) |
| SentinelOne Singularity | Very good (autonomous) | Very good | Yes (Remote Shell) | Yes (strong) | $$$ |
| Carbon Black EDR | Good (reputation + behavioral) | Good | Yes (Live Response) | Yes (good) | $$ |
| Elastic Security (XDR) | Good (ML rules) | Good (growing) | Limited | Yes | $ (open source base) |
4. Threat Intelligence Integration
CTI Types & IOCs vs TTPs
Threat intelligence has three levels of maturity. Organizations that consume only IOCs are fighting the last war; organizations that detect TTPs are prepared for the next one.
- Strategic CTI: Long-term threat landscape analysis — which threat actor groups target your industry, emerging attack techniques, geopolitical context. Audience: CISO and executives for program investment decisions.
- Operational CTI: Specific campaigns targeting your sector right now — TTPs in use, infrastructure, malware families, C2 patterns. Audience: SOC managers and IR leads to prioritize detection focus.
- Tactical CTI: Specific IOCs (Indicators of Compromise) — IP addresses, domains, file hashes, URLs. Most specific but shortest lifespan — attackers rotate infrastructure. Audience: analysts to block and detect.
- IOC Volatility: IP addresses: hours to days. Domains: days to weeks. File hashes: days (easily modified). TTPs (MITRE ATT&CK techniques): months to years. Investing in TTP-based detection provides much more durable coverage than IOC-only.
- STIX 2.1 / TAXII 2.1: Open standards for structured CTI sharing. STIX defines the format (JSON-based objects: Indicator, ThreatActor, AttackPattern, Campaign). TAXII defines the transport protocol for sharing.
Threat Intelligence Platforms & ISACs
Threat intelligence is most valuable when it's integrated into detection workflows — blocking IOCs automatically, enriching SIEM alerts with threat actor context, and informing hunting hypotheses.
- MISP (Malware Information Sharing Platform): Open-source threat intelligence platform. STIX/TAXII support, integration with SIEM/SOAR/EDR. Used by CERTs, ISACs, and enterprises worldwide. Community-maintained threat feed sharing.
- Recorded Future: Premium commercial intelligence platform with extensive dark web monitoring, threat actor tracking, and SIEM integration. Real-time risk scoring for IPs/domains/hashes.
- Mandiant Advantage / Google Threat Intelligence: Deep intelligence from Mandiant incident response cases. Nations-state actor tracking, malware analysis reports, and threat indicators with high-confidence attribution.
- FS-ISAC (Financial Services): Sector-specific intel sharing for financial institutions. Threat alerts, IOC feeds, and IR collaboration for the financial sector.
- H-ISAC (Healthcare): Healthcare-specific threat intelligence sharing. Critical for hospitals and healthcare organizations where ransomware can directly threaten patient safety.
- Free feeds: abuse.ch (URLhaus, ThreatFox, MalwareBazaar), AlienVault OTX, CISA KEV (Known Exploited Vulnerabilities), Emerging Threats rules
5. Threat Hunting
Hunting Methodology
Threat hunting is proactive, analyst-driven search for threats that have evaded automated detection. It assumes compromise and looks for evidence — rather than waiting for an alert to be generated.
- Hypothesis-Driven: "TA505 is known to use WMI for persistence. Let me hunt for suspicious WMI event subscriptions across the fleet." Start with threat intelligence or observed TTPs to form a testable hypothesis.
- Data-Driven: Statistical analysis to identify anomalies without a prior hypothesis. "Find all processes that made outbound connections to ports other than 80/443/53 in the last 30 days, sorted by rarity." Rare events are disproportionately interesting.
- IOC Retrospective: A new IOC was published — search historical data to determine if it was seen in your environment before it became known-bad. Retrospective hunting often discovers past compromises.
- Typical hunt cycle: 1-2 week sprint per hypothesis. Define scope, query data, investigate findings, produce output (new detection rule, playbook update, or confirmed clean). Document both positive and negative findings.
Hunting Tools
Threat hunters need tools that support fast, expressive queries across large datasets, ideally with cross-platform correlation capabilities.
- Elastic EQL (Event Query Language): Purpose-built language for sequence-based hunting.
sequence by host.name [process where process.name == "cmd.exe"] [network where destination.port == 443]— find cmd.exe that made HTTPS connections. - Splunk SPL: Powerful search and statistical language.
index=endpoint event_type=process | stats count by parent_process_name, process_name | where count < 5— find rare parent-child process relationships. - Velociraptor: Open-source DFIR and hunting platform. Deploys agents that execute VQL (Velociraptor Query Language) hunt queries across the entire fleet in seconds. Collects artifacts, memory, and forensic data at scale.
- osquery: Exposes OS state as SQL tables.
SELECT * FROM processes WHERE name='svchost.exe' AND parent NOT IN (SELECT pid FROM processes WHERE name='services.exe')— find svchost with wrong parent. - MITRE ATT&CK Navigator: Map current detection coverage, identify gaps, plan hunting sprints to cover uncovered techniques. Collaborative annotation of ATT&CK matrix for team coordination.
Human Hunters Find What Automation Misses
Automated detection systems are excellent at finding known-bad patterns against known signatures and rules. Sophisticated threat actors — particularly nation-state operators and ransomware groups that invest in their tooling — specifically test their techniques against common EDR and SIEM products before deployment. The threats that evade automated detection are only found by human hunters with threat intelligence context, creative query writing, and an attacker's mental model of what "hiding in plain sight" looks like. Dedicate at minimum one analyst-sprint per month to structured threat hunting, and use the outputs to build new detection rules that convert human discoveries into automated future coverage.