Network visibility is the foundation of all network security operations. Without knowing what devices exist, what traffic flows between them, and what events are occurring, detection and response are impossible. Most organizations discover significant visibility gaps when they begin a formal monitoring program.
The Visibility Gap
Enterprise networks routinely have large portions with no monitoring coverage — unmanaged IoT devices, shadow IT, unmonitored network segments, and encrypted traffic that bypasses inspection. These gaps are exactly where attackers establish persistence.
- Unmanaged devices: printers, cameras, HVAC controllers, IoT
- Cloud workloads: traffic between cloud services often unmonitored
- East-west traffic: server-to-server flows rarely inspected
- Encrypted DNS/HTTP: bypasses traditional network monitoring
- OT/ICS networks: critical infrastructure often poorly monitored
What to Monitor
Effective monitoring requires prioritizing data sources based on the threats most relevant to the organization. Full visibility is the goal; a phased approach ensures the highest-value sources are implemented first.
- Traffic flows: who is talking to whom, on what port, how much data
- Device health: CPU, memory, interface errors, link state changes
- Security events: firewall denies, auth failures, IDS alerts
- User activity: VPN connections, privileged access, data movement
- DNS: queries, answers, NXDOMAIN rates, new domains
Network Topology & Asset Discovery
Monitoring is only meaningful in the context of a known, up-to-date network topology. Asset inventory — knowing every device, its IP, OS, owner, and function — is a prerequisite to distinguishing normal from anomalous behavior.
- Active discovery: Nmap, Nessus, Qualys — scan known ranges
- Passive discovery: Zeek, Suricata observe devices from traffic
- DHCP logs: authoritative source for IP-to-MAC-to-hostname mapping
- CMDB integration: combine network scan data with configuration management
Data Sources Overview
Network monitoring draws from multiple complementary data sources. No single source provides complete visibility — the combination of flow data, packet capture, and log aggregation is required for comprehensive coverage.
- NetFlow/IPFIX: flow metadata from routers, switches, firewalls
- PCAP: full packet capture at key network points
- Syslog: device events, firewall logs, authentication events
- SNMP: device health, interface counters, availability
- DNS logs: all queries and responses (passive DNS)
- EDR telemetry: endpoint process/network activity correlated with network
Flow-based monitoring captures metadata about network conversations — source, destination, port, protocol, byte and packet counts — without storing full packet content. This makes it scalable to high-speed links while retaining the most security-relevant information. Flow data is the foundation of network traffic analysis at scale.
NetFlow v5 / v9 vs. IPFIX
NetFlow (Cisco-originated) and IPFIX (IETF standardization of NetFlow v10) are the dominant flow export protocols. IPFIX is more flexible and extensible, supporting custom information elements and variable-length fields not possible in NetFlow v5/v9.
- NetFlow v5: fixed 7-tuple (src/dst IP, port, protocol, ToS, interface)
- NetFlow v9: template-based, extensible, supports IPv6 and MPLS
- IPFIX (RFC 7011): IETF standard; vendor-neutral; rich extension support
- Sampling: 1:1 for security analysis; 1:1000 acceptable for capacity planning
sFlow — Sampled Flow Data
sFlow (RFC 3176) takes a different approach: it samples actual packets (not just flow records) at a configurable ratio, providing both flow metadata and packet content samples. More scalable for very high-speed links; supported by virtually all switch vendors.
- Hardware-based sampling: minimal performance impact even at line rate
- Packet samples include Layer 2 headers: useful for switch-level monitoring
- Supported natively by Arista, Juniper, HP/Aruba, Dell EMC switches
- Less precise than NetFlow for exact byte counts (statistical sampling)
Flow Collectors & Analysis Tools
Flow collectors receive, store, and index flow records from network devices. Good flow analysis tools enable fast queries across millions of flow records to find top talkers, unusual connections, and baseline deviations.
- nfdump / nfcapd: open source, high-performance, command-line analysis
- ntopng: open source web UI; community and commercial editions
- Elastic Stack: store flows in Elasticsearch, visualize in Kibana
- SolarWinds NTA: commercial, strong reporting and capacity planning
- Kentik: SaaS flow analytics with DDoS detection
# nfdump query examples for security analysis
# Show top 10 talkers by bytes (last hour)
nfdump -R /var/netflow/ -t "$(date -d '1 hour ago' +%Y/%m/%d.%H:%M):$(date +%Y/%m/%d.%H:%M)" \
-s srcip/bytes -n 10
# Find connections to a specific external IP (potential C2 contact)
nfdump -R /var/netflow/ 'dst ip 198.51.100.23' -o extended
# Find all DNS traffic NOT going to authorized resolvers (DNS tunneling indicator)
nfdump -R /var/netflow/ 'dst port 53 and not dst ip 10.0.0.53 and not dst ip 8.8.8.8' \
-s srcip/flows -n 20
# Detect potential port scanning: many unique dst ports from one source
nfdump -R /var/netflow/ 'src ip 10.0.1.0/24' -a -A srcip \
-s srcip/flows -n 10 -o "fmt:%sa %fl %dp"
# Large outbound transfers (data exfiltration indicator)
nfdump -R /var/netflow/ 'src ip 10.0.0.0/8 and dst ip not 10.0.0.0/8 and bytes > 100000000' \
-s srcip/bytes -n 20
A SIEM (Security Information and Event Management) system aggregates, normalizes, and correlates security events from across the organization. The value of a SIEM is not in collecting logs — it's in the detection rules and correlation logic that transforms raw events into actionable alerts.
SIEM Core Functions
Modern SIEMs go beyond log aggregation. Detection rules correlate events across multiple sources; UEBA identifies behavioral anomalies; SOAR integration automates response actions. The SIEM is the operational center of the security monitoring program.
- Log collection: agents, syslog, API integrations, cloud connectors
- Normalization: map vendor-specific field names to a common schema
- Correlation: multi-source rules ("if A + B within 5 min, alert on C")
- Enrichment: add context (geo-IP, ASN, threat intel, asset data)
- Retention: 90 days hot; 1–7 years cold (compliance-dependent)
Sigma Rules
Sigma is a generic, vendor-neutral rule format for SIEM detection logic. A Sigma rule describes a detection pattern in YAML; the Sigma converter tool translates it into query syntax for any supported SIEM platform (Splunk, Elastic, Sentinel, QRadar, etc.).
- Large community repository: github.com/SigmaHQ/sigma (8,000+ rules)
- MITRE ATT&CK tagged: rules map to specific techniques and tactics
- Write once, deploy everywhere — avoid SIEM vendor lock-in for detection logic
- Sigma's pySigma: Python library for rule conversion and testing
Alert Fatigue Management
Alert fatigue — analysts ignoring alerts because of too many false positives — is the #1 operational failure mode for SIEM deployments. Effective SIEMs prioritize quality over quantity: fewer, higher-confidence alerts that analysts trust and act on.
- Start with 20–30 high-confidence rules; expand methodically
- Score and prioritize alerts; work high-severity first
- Review and suppress FP-generating rules within 72 hours
- Monthly review: disable any rule with >80% FP rate
- Deduplicate: don't alert on the same event multiple times
| SIEM Platform | Deployment | Cost Model | Strengths | Scale |
|---|---|---|---|---|
| Splunk Enterprise Security | On-prem or cloud (Splunk Cloud) | GB/day ingest or infrastructure | Powerful SPL query language; large app ecosystem; proven at scale | Enterprise (100GB+/day) |
| Microsoft Sentinel | Cloud (Azure) | Per GB ingest; some free connectors | Deep Microsoft/Azure integration; built-in UEBA; Copilot AI | Mid to large enterprise |
| Elastic SIEM (OpenSearch) | Self-hosted or Elastic Cloud | Free (OSS); commercial for advanced features | Flexible; fast search; Kibana visualization; ECS schema | Any (scales with hardware) |
| Wazuh | Self-hosted (Docker/VM) | Free and open source | HIDS + SIEM combined; FIM; compliance modules (PCI, HIPAA) | SMB to mid-market |
| IBM QRadar | On-prem or QRadar SIEM SaaS | EPS (events per second) + FPS (flows) | Strong correlation engine; long enterprise track record | Large enterprise |
NDR vs. SIEM vs. EDR
These three categories form the detection triad. Each has a different vantage point — SIEM sees logs and events; EDR sees endpoint process activity; NDR sees actual network traffic. Gaps in one are typically covered by another, making all three complementary rather than redundant.
- SIEM: log-centric; depends on log sources being available and complete
- EDR: endpoint-centric; doesn't see network devices, OT, or IoT
- NDR: network-centric; doesn't require agents; sees all traffic
- NDR excels at: lateral movement, C2 beaconing, encrypted malware traffic
Behavioral Baselining in NDR
NDR platforms build behavioral models for every device and user on the network, learning what normal communication looks like. Deviations from the established baseline — a database server making outbound web requests, a workstation scanning internal subnets — generate alerts regardless of whether a signature matches.
- Device peer groups: compare similar devices to each other
- Time-of-day models: flag access patterns that deviate from normal hours
- Volumetric baselining: detect data movement exceeding normal patterns
- Protocol models: detect protocol misuse (HTTP on port 443, DNS tunneling)
Encrypted Traffic Analysis (ETA)
NDR platforms analyze TLS traffic without decryption by examining observable metadata: certificate chains, cipher suites, handshake timing, flow sizes, and packet inter-arrival times. JA3/JA4 hashing identifies client software from TLS Client Hello parameters alone.
- JA4 (2023): more stable than JA3; resistant to randomization evasion
- Certificate analysis: self-signed, short validity, unusual SAN fields
- Flow statistics: C2 beaconing has regular timing and consistent byte counts
- ALPN field: identifies application protocol negotiated within TLS
Lateral Movement Detection
Lateral movement — attackers pivoting from an initial foothold to additional systems — is the phase where NDR provides the most unique value. Network-level activity like port scanning, SMB enumeration, and pass-the-hash are visible in traffic even when endpoint agents are absent.
- Unusual SMB/RPC connections between workstations
- Internal port scanning patterns (rapid sequential connection attempts)
- Kerberoasting: anomalous Kerberos TGS requests for service accounts
- DCSync: unusual replication requests to domain controllers
- Mimikatz patterns: LSASS access indicators in network-adjacent traffic
Visibility Without Analysis Is Just Noise
Organizations frequently invest heavily in data collection infrastructure — full packet capture, SIEM, flow collection — but underinvest in the analyst capability and detection logic needed to turn that data into actionable intelligence. A terabyte of unanalyzed PCAP provides no security value. Invest equally in collection and correlation.
Full Packet Capture Strategy
Full PCAP is invaluable for incident investigation but expensive to store at scale. A tiered strategy captures everything at chokepoints for a short retention window, with selective long-term storage for high-risk segments.
- Internet edge: 24–72 hour rolling PCAP (highest value, high volume)
- DMZ: 7–14 day rolling PCAP (critical for web server forensics)
- Data center east-west: selectively capture sensitive server segments
- Tools: Zeek + PCAP, Arkime (Moloch), Security Onion (integrated platform)
TAP vs. SPAN for Capture
Network TAPs (Test Access Points) provide a passive hardware copy of traffic — they do not affect the network path and cannot be disabled by software. SPAN (Switched Port Analyzer) ports are configured in switch software — simpler to deploy but can drop packets under load and can be inadvertently modified.
- TAP: passive, undetectable, no packet loss, preferred for critical segments
- SPAN: free (switch feature), flexible, subject to port oversubscription drops
- Aggregation TAP: combine multiple links onto one monitoring interface
- Virtual TAP: for VM/container environments (vSphere dvSwitch mirror port)
Retention Policies for Compliance
Log retention requirements vary by compliance framework. Retention policies must balance regulatory requirements against storage cost and query performance. Tiered storage (hot/warm/cold) manages cost while meeting retention mandates.
- PCI DSS: 12 months log retention; 3 months immediately available
- HIPAA: 6 years audit logs; 3 months readily accessible
- SOC 2: varies by auditor; typically 12 months minimum
- NIS2 (EU): 3 months for security events; 12 months for significant incidents
- Immutable storage: WORM or object storage with object lock for tamper prevention
Threat Hunting with Network Data
Proactive threat hunting uses network data to search for attacker activity that has evaded automated detection. Hunters form hypotheses based on threat intelligence and use flow data, DNS logs, and PCAP to test them — finding evidence of compromise before an alert fires.
- Hunt hypothesis: "APT group X uses HTTPS C2 with 5-minute beaconing intervals"
- Search: flow data for regular-interval connections to external IPs
- Pivot: examine TLS certificates, domain registration, ASN history
- Hunt infrastructure: Splunk, Elastic, or specialized tools (Gravwell)