⏱ 9 min read 📊 Intermediate 🗓 Updated Jan 2025
👁️ Network Visibility Fundamentals

Network visibility is the foundation of all network security operations. Without knowing what devices exist, what traffic flows between them, and what events are occurring, detection and response are impossible. Most organizations discover significant visibility gaps when they begin a formal monitoring program.

The Visibility Gap

Enterprise networks routinely have large portions with no monitoring coverage — unmanaged IoT devices, shadow IT, unmonitored network segments, and encrypted traffic that bypasses inspection. These gaps are exactly where attackers establish persistence.

  • Unmanaged devices: printers, cameras, HVAC controllers, IoT
  • Cloud workloads: traffic between cloud services often unmonitored
  • East-west traffic: server-to-server flows rarely inspected
  • Encrypted DNS/HTTP: bypasses traditional network monitoring
  • OT/ICS networks: critical infrastructure often poorly monitored

What to Monitor

Effective monitoring requires prioritizing data sources based on the threats most relevant to the organization. Full visibility is the goal; a phased approach ensures the highest-value sources are implemented first.

  • Traffic flows: who is talking to whom, on what port, how much data
  • Device health: CPU, memory, interface errors, link state changes
  • Security events: firewall denies, auth failures, IDS alerts
  • User activity: VPN connections, privileged access, data movement
  • DNS: queries, answers, NXDOMAIN rates, new domains

Network Topology & Asset Discovery

Monitoring is only meaningful in the context of a known, up-to-date network topology. Asset inventory — knowing every device, its IP, OS, owner, and function — is a prerequisite to distinguishing normal from anomalous behavior.

  • Active discovery: Nmap, Nessus, Qualys — scan known ranges
  • Passive discovery: Zeek, Suricata observe devices from traffic
  • DHCP logs: authoritative source for IP-to-MAC-to-hostname mapping
  • CMDB integration: combine network scan data with configuration management

Data Sources Overview

Network monitoring draws from multiple complementary data sources. No single source provides complete visibility — the combination of flow data, packet capture, and log aggregation is required for comprehensive coverage.

  • NetFlow/IPFIX: flow metadata from routers, switches, firewalls
  • PCAP: full packet capture at key network points
  • Syslog: device events, firewall logs, authentication events
  • SNMP: device health, interface counters, availability
  • DNS logs: all queries and responses (passive DNS)
  • EDR telemetry: endpoint process/network activity correlated with network
📊 Flow-Based Monitoring

Flow-based monitoring captures metadata about network conversations — source, destination, port, protocol, byte and packet counts — without storing full packet content. This makes it scalable to high-speed links while retaining the most security-relevant information. Flow data is the foundation of network traffic analysis at scale.

NetFlow v5 / v9 vs. IPFIX

NetFlow (Cisco-originated) and IPFIX (IETF standardization of NetFlow v10) are the dominant flow export protocols. IPFIX is more flexible and extensible, supporting custom information elements and variable-length fields not possible in NetFlow v5/v9.

  • NetFlow v5: fixed 7-tuple (src/dst IP, port, protocol, ToS, interface)
  • NetFlow v9: template-based, extensible, supports IPv6 and MPLS
  • IPFIX (RFC 7011): IETF standard; vendor-neutral; rich extension support
  • Sampling: 1:1 for security analysis; 1:1000 acceptable for capacity planning

sFlow — Sampled Flow Data

sFlow (RFC 3176) takes a different approach: it samples actual packets (not just flow records) at a configurable ratio, providing both flow metadata and packet content samples. More scalable for very high-speed links; supported by virtually all switch vendors.

  • Hardware-based sampling: minimal performance impact even at line rate
  • Packet samples include Layer 2 headers: useful for switch-level monitoring
  • Supported natively by Arista, Juniper, HP/Aruba, Dell EMC switches
  • Less precise than NetFlow for exact byte counts (statistical sampling)

Flow Collectors & Analysis Tools

Flow collectors receive, store, and index flow records from network devices. Good flow analysis tools enable fast queries across millions of flow records to find top talkers, unusual connections, and baseline deviations.

  • nfdump / nfcapd: open source, high-performance, command-line analysis
  • ntopng: open source web UI; community and commercial editions
  • Elastic Stack: store flows in Elasticsearch, visualize in Kibana
  • SolarWinds NTA: commercial, strong reporting and capacity planning
  • Kentik: SaaS flow analytics with DDoS detection
# nfdump query examples for security analysis

# Show top 10 talkers by bytes (last hour)
nfdump -R /var/netflow/ -t "$(date -d '1 hour ago' +%Y/%m/%d.%H:%M):$(date +%Y/%m/%d.%H:%M)" \
  -s srcip/bytes -n 10

# Find connections to a specific external IP (potential C2 contact)
nfdump -R /var/netflow/ 'dst ip 198.51.100.23' -o extended

# Find all DNS traffic NOT going to authorized resolvers (DNS tunneling indicator)
nfdump -R /var/netflow/ 'dst port 53 and not dst ip 10.0.0.53 and not dst ip 8.8.8.8' \
  -s srcip/flows -n 20

# Detect potential port scanning: many unique dst ports from one source
nfdump -R /var/netflow/ 'src ip 10.0.1.0/24' -a -A srcip \
  -s srcip/flows -n 10 -o "fmt:%sa %fl %dp"

# Large outbound transfers (data exfiltration indicator)
nfdump -R /var/netflow/ 'src ip 10.0.0.0/8 and dst ip not 10.0.0.0/8 and bytes > 100000000' \
  -s srcip/bytes -n 20
📋 SIEM Integration

A SIEM (Security Information and Event Management) system aggregates, normalizes, and correlates security events from across the organization. The value of a SIEM is not in collecting logs — it's in the detection rules and correlation logic that transforms raw events into actionable alerts.

SIEM Core Functions

Modern SIEMs go beyond log aggregation. Detection rules correlate events across multiple sources; UEBA identifies behavioral anomalies; SOAR integration automates response actions. The SIEM is the operational center of the security monitoring program.

  • Log collection: agents, syslog, API integrations, cloud connectors
  • Normalization: map vendor-specific field names to a common schema
  • Correlation: multi-source rules ("if A + B within 5 min, alert on C")
  • Enrichment: add context (geo-IP, ASN, threat intel, asset data)
  • Retention: 90 days hot; 1–7 years cold (compliance-dependent)

Sigma Rules

Sigma is a generic, vendor-neutral rule format for SIEM detection logic. A Sigma rule describes a detection pattern in YAML; the Sigma converter tool translates it into query syntax for any supported SIEM platform (Splunk, Elastic, Sentinel, QRadar, etc.).

  • Large community repository: github.com/SigmaHQ/sigma (8,000+ rules)
  • MITRE ATT&CK tagged: rules map to specific techniques and tactics
  • Write once, deploy everywhere — avoid SIEM vendor lock-in for detection logic
  • Sigma's pySigma: Python library for rule conversion and testing

Alert Fatigue Management

Alert fatigue — analysts ignoring alerts because of too many false positives — is the #1 operational failure mode for SIEM deployments. Effective SIEMs prioritize quality over quantity: fewer, higher-confidence alerts that analysts trust and act on.

  • Start with 20–30 high-confidence rules; expand methodically
  • Score and prioritize alerts; work high-severity first
  • Review and suppress FP-generating rules within 72 hours
  • Monthly review: disable any rule with >80% FP rate
  • Deduplicate: don't alert on the same event multiple times
SIEM Platform Deployment Cost Model Strengths Scale
Splunk Enterprise SecurityOn-prem or cloud (Splunk Cloud)GB/day ingest or infrastructurePowerful SPL query language; large app ecosystem; proven at scaleEnterprise (100GB+/day)
Microsoft SentinelCloud (Azure)Per GB ingest; some free connectorsDeep Microsoft/Azure integration; built-in UEBA; Copilot AIMid to large enterprise
Elastic SIEM (OpenSearch)Self-hosted or Elastic CloudFree (OSS); commercial for advanced featuresFlexible; fast search; Kibana visualization; ECS schemaAny (scales with hardware)
WazuhSelf-hosted (Docker/VM)Free and open sourceHIDS + SIEM combined; FIM; compliance modules (PCI, HIPAA)SMB to mid-market
IBM QRadarOn-prem or QRadar SIEM SaaSEPS (events per second) + FPS (flows)Strong correlation engine; long enterprise track recordLarge enterprise
🧬 Network Detection & Response (NDR)

NDR vs. SIEM vs. EDR

These three categories form the detection triad. Each has a different vantage point — SIEM sees logs and events; EDR sees endpoint process activity; NDR sees actual network traffic. Gaps in one are typically covered by another, making all three complementary rather than redundant.

  • SIEM: log-centric; depends on log sources being available and complete
  • EDR: endpoint-centric; doesn't see network devices, OT, or IoT
  • NDR: network-centric; doesn't require agents; sees all traffic
  • NDR excels at: lateral movement, C2 beaconing, encrypted malware traffic

Behavioral Baselining in NDR

NDR platforms build behavioral models for every device and user on the network, learning what normal communication looks like. Deviations from the established baseline — a database server making outbound web requests, a workstation scanning internal subnets — generate alerts regardless of whether a signature matches.

  • Device peer groups: compare similar devices to each other
  • Time-of-day models: flag access patterns that deviate from normal hours
  • Volumetric baselining: detect data movement exceeding normal patterns
  • Protocol models: detect protocol misuse (HTTP on port 443, DNS tunneling)

Encrypted Traffic Analysis (ETA)

NDR platforms analyze TLS traffic without decryption by examining observable metadata: certificate chains, cipher suites, handshake timing, flow sizes, and packet inter-arrival times. JA3/JA4 hashing identifies client software from TLS Client Hello parameters alone.

  • JA4 (2023): more stable than JA3; resistant to randomization evasion
  • Certificate analysis: self-signed, short validity, unusual SAN fields
  • Flow statistics: C2 beaconing has regular timing and consistent byte counts
  • ALPN field: identifies application protocol negotiated within TLS

Lateral Movement Detection

Lateral movement — attackers pivoting from an initial foothold to additional systems — is the phase where NDR provides the most unique value. Network-level activity like port scanning, SMB enumeration, and pass-the-hash are visible in traffic even when endpoint agents are absent.

  • Unusual SMB/RPC connections between workstations
  • Internal port scanning patterns (rapid sequential connection attempts)
  • Kerberoasting: anomalous Kerberos TGS requests for service accounts
  • DCSync: unusual replication requests to domain controllers
  • Mimikatz patterns: LSASS access indicators in network-adjacent traffic
⚙️ Network Monitoring Best Practices

Visibility Without Analysis Is Just Noise

Organizations frequently invest heavily in data collection infrastructure — full packet capture, SIEM, flow collection — but underinvest in the analyst capability and detection logic needed to turn that data into actionable intelligence. A terabyte of unanalyzed PCAP provides no security value. Invest equally in collection and correlation.

Full Packet Capture Strategy

Full PCAP is invaluable for incident investigation but expensive to store at scale. A tiered strategy captures everything at chokepoints for a short retention window, with selective long-term storage for high-risk segments.

  • Internet edge: 24–72 hour rolling PCAP (highest value, high volume)
  • DMZ: 7–14 day rolling PCAP (critical for web server forensics)
  • Data center east-west: selectively capture sensitive server segments
  • Tools: Zeek + PCAP, Arkime (Moloch), Security Onion (integrated platform)

TAP vs. SPAN for Capture

Network TAPs (Test Access Points) provide a passive hardware copy of traffic — they do not affect the network path and cannot be disabled by software. SPAN (Switched Port Analyzer) ports are configured in switch software — simpler to deploy but can drop packets under load and can be inadvertently modified.

  • TAP: passive, undetectable, no packet loss, preferred for critical segments
  • SPAN: free (switch feature), flexible, subject to port oversubscription drops
  • Aggregation TAP: combine multiple links onto one monitoring interface
  • Virtual TAP: for VM/container environments (vSphere dvSwitch mirror port)

Retention Policies for Compliance

Log retention requirements vary by compliance framework. Retention policies must balance regulatory requirements against storage cost and query performance. Tiered storage (hot/warm/cold) manages cost while meeting retention mandates.

  • PCI DSS: 12 months log retention; 3 months immediately available
  • HIPAA: 6 years audit logs; 3 months readily accessible
  • SOC 2: varies by auditor; typically 12 months minimum
  • NIS2 (EU): 3 months for security events; 12 months for significant incidents
  • Immutable storage: WORM or object storage with object lock for tamper prevention

Threat Hunting with Network Data

Proactive threat hunting uses network data to search for attacker activity that has evaded automated detection. Hunters form hypotheses based on threat intelligence and use flow data, DNS logs, and PCAP to test them — finding evidence of compromise before an alert fires.

  • Hunt hypothesis: "APT group X uses HTTPS C2 with 5-minute beaconing intervals"
  • Search: flow data for regular-interval connections to external IPs
  • Pivot: examine TLS certificates, domain registration, ASN history
  • Hunt infrastructure: Splunk, Elastic, or specialized tools (Gravwell)