Network Monitoring & Visibility | Network Security

👁️ Network Visibility Fundamentals

Network visibility is the foundation of all network security operations. Without knowing what devices exist, what traffic flows between them, and what events are occurring, detection and response are impossible. Most organizations discover significant visibility gaps when they begin a formal monitoring program.

The Visibility Gap

Enterprise networks routinely have large portions with no monitoring coverage — unmanaged IoT devices, shadow IT, unmonitored network segments, and encrypted traffic that bypasses inspection. These gaps are exactly where attackers establish persistence.

Unmanaged devices: printers, cameras, HVAC controllers, IoT
Cloud workloads: traffic between cloud services often unmonitored
East-west traffic: server-to-server flows rarely inspected
Encrypted DNS/HTTP: bypasses traditional network monitoring
OT/ICS networks: critical infrastructure often poorly monitored

What to Monitor

Effective monitoring requires prioritizing data sources based on the threats most relevant to the organization. Full visibility is the goal; a phased approach ensures the highest-value sources are implemented first.

Traffic flows: who is talking to whom, on what port, how much data
Device health: CPU, memory, interface errors, link state changes
Security events: firewall denies, auth failures, IDS alerts
User activity: VPN connections, privileged access, data movement
DNS: queries, answers, NXDOMAIN rates, new domains

Network Topology & Asset Discovery

Monitoring is only meaningful in the context of a known, up-to-date network topology. Asset inventory — knowing every device, its IP, OS, owner, and function — is a prerequisite to distinguishing normal from anomalous behavior.

Active discovery: Nmap, Nessus, Qualys — scan known ranges
Passive discovery: Zeek, Suricata observe devices from traffic
DHCP logs: authoritative source for IP-to-MAC-to-hostname mapping
CMDB integration: combine network scan data with configuration management

Data Sources Overview

Network monitoring draws from multiple complementary data sources. No single source provides complete visibility — the combination of flow data, packet capture, and log aggregation is required for comprehensive coverage.

NetFlow/IPFIX: flow metadata from routers, switches, firewalls
PCAP: full packet capture at key network points
Syslog: device events, firewall logs, authentication events
SNMP: device health, interface counters, availability
DNS logs: all queries and responses (passive DNS)
EDR telemetry: endpoint process/network activity correlated with network

📊 Flow-Based Monitoring

Flow-based monitoring captures metadata about network conversations — source, destination, port, protocol, byte and packet counts — without storing full packet content. This makes it scalable to high-speed links while retaining the most security-relevant information. Flow data is the foundation of network traffic analysis at scale.

NetFlow v5 / v9 vs. IPFIX

NetFlow (Cisco-originated) and IPFIX (IETF standardization of NetFlow v10) are the dominant flow export protocols. IPFIX is more flexible and extensible, supporting custom information elements and variable-length fields not possible in NetFlow v5/v9.

NetFlow v5: fixed 7-tuple (src/dst IP, port, protocol, ToS, interface)
NetFlow v9: template-based, extensible, supports IPv6 and MPLS
IPFIX (RFC 7011): IETF standard; vendor-neutral; rich extension support
Sampling: 1:1 for security analysis; 1:1000 acceptable for capacity planning

sFlow — Sampled Flow Data

sFlow (RFC 3176) takes a different approach: it samples actual packets (not just flow records) at a configurable ratio, providing both flow metadata and packet content samples. More scalable for very high-speed links; supported by virtually all switch vendors.

Hardware-based sampling: minimal performance impact even at line rate
Packet samples include Layer 2 headers: useful for switch-level monitoring
Supported natively by Arista, Juniper, HP/Aruba, Dell EMC switches
Less precise than NetFlow for exact byte counts (statistical sampling)

Flow Collectors & Analysis Tools

Flow collectors receive, store, and index flow records from network devices. Good flow analysis tools enable fast queries across millions of flow records to find top talkers, unusual connections, and baseline deviations.

nfdump / nfcapd: open source, high-performance, command-line analysis
ntopng: open source web UI; community and commercial editions
Elastic Stack: store flows in Elasticsearch, visualize in Kibana
SolarWinds NTA: commercial, strong reporting and capacity planning
Kentik: SaaS flow analytics with DDoS detection

# nfdump query examples for security analysis

# Show top 10 talkers by bytes (last hour)
nfdump -R /var/netflow/ -t "$(date -d '1 hour ago' +%Y/%m/%d.%H:%M):$(date +%Y/%m/%d.%H:%M)" \
  -s srcip/bytes -n 10

# Find connections to a specific external IP (potential C2 contact)
nfdump -R /var/netflow/ 'dst ip 198.51.100.23' -o extended

# Find all DNS traffic NOT going to authorized resolvers (DNS tunneling indicator)
nfdump -R /var/netflow/ 'dst port 53 and not dst ip 10.0.0.53 and not dst ip 8.8.8.8' \
  -s srcip/flows -n 20

# Detect potential port scanning: many unique dst ports from one source
nfdump -R /var/netflow/ 'src ip 10.0.1.0/24' -a -A srcip \
  -s srcip/flows -n 10 -o "fmt:%sa %fl %dp"

# Large outbound transfers (data exfiltration indicator)
nfdump -R /var/netflow/ 'src ip 10.0.0.0/8 and dst ip not 10.0.0.0/8 and bytes > 100000000' \
  -s srcip/bytes -n 20

📋 SIEM Integration

A SIEM (Security Information and Event Management) system aggregates, normalizes, and correlates security events from across the organization. The value of a SIEM is not in collecting logs — it's in the detection rules and correlation logic that transforms raw events into actionable alerts.

SIEM Core Functions

Modern SIEMs go beyond log aggregation. Detection rules correlate events across multiple sources; UEBA identifies behavioral anomalies; SOAR integration automates response actions. The SIEM is the operational center of the security monitoring program.

Log collection: agents, syslog, API integrations, cloud connectors
Normalization: map vendor-specific field names to a common schema
Correlation: multi-source rules ("if A + B within 5 min, alert on C")
Enrichment: add context (geo-IP, ASN, threat intel, asset data)
Retention: 90 days hot; 1–7 years cold (compliance-dependent)

Sigma Rules

Sigma is a generic, vendor-neutral rule format for SIEM detection logic. A Sigma rule describes a detection pattern in YAML; the Sigma converter tool translates it into query syntax for any supported SIEM platform (Splunk, Elastic, Sentinel, QRadar, etc.).

Large community repository: github.com/SigmaHQ/sigma (8,000+ rules)
MITRE ATT&CK tagged: rules map to specific techniques and tactics
Write once, deploy everywhere — avoid SIEM vendor lock-in for detection logic
Sigma's pySigma: Python library for rule conversion and testing

Alert Fatigue Management

Alert fatigue — analysts ignoring alerts because of too many false positives — is the #1 operational failure mode for SIEM deployments. Effective SIEMs prioritize quality over quantity: fewer, higher-confidence alerts that analysts trust and act on.

Start with 20–30 high-confidence rules; expand methodically
Score and prioritize alerts; work high-severity first
Review and suppress FP-generating rules within 72 hours
Monthly review: disable any rule with >80% FP rate
Deduplicate: don't alert on the same event multiple times

SIEM Platform	Deployment	Cost Model	Strengths	Scale
Splunk Enterprise Security	On-prem or cloud (Splunk Cloud)	GB/day ingest or infrastructure	Powerful SPL query language; large app ecosystem; proven at scale	Enterprise (100GB+/day)
Microsoft Sentinel	Cloud (Azure)	Per GB ingest; some free connectors	Deep Microsoft/Azure integration; built-in UEBA; Copilot AI	Mid to large enterprise
Elastic SIEM (OpenSearch)	Self-hosted or Elastic Cloud	Free (OSS); commercial for advanced features	Flexible; fast search; Kibana visualization; ECS schema	Any (scales with hardware)
Wazuh	Self-hosted (Docker/VM)	Free and open source	HIDS + SIEM combined; FIM; compliance modules (PCI, HIPAA)	SMB to mid-market
IBM QRadar	On-prem or QRadar SIEM SaaS	EPS (events per second) + FPS (flows)	Strong correlation engine; long enterprise track record	Large enterprise

🧬 Network Detection & Response (NDR)

NDR vs. SIEM vs. EDR

These three categories form the detection triad. Each has a different vantage point — SIEM sees logs and events; EDR sees endpoint process activity; NDR sees actual network traffic. Gaps in one are typically covered by another, making all three complementary rather than redundant.

SIEM: log-centric; depends on log sources being available and complete
EDR: endpoint-centric; doesn't see network devices, OT, or IoT
NDR: network-centric; doesn't require agents; sees all traffic
NDR excels at: lateral movement, C2 beaconing, encrypted malware traffic

Behavioral Baselining in NDR

NDR platforms build behavioral models for every device and user on the network, learning what normal communication looks like. Deviations from the established baseline — a database server making outbound web requests, a workstation scanning internal subnets — generate alerts regardless of whether a signature matches.

Device peer groups: compare similar devices to each other
Time-of-day models: flag access patterns that deviate from normal hours
Volumetric baselining: detect data movement exceeding normal patterns
Protocol models: detect protocol misuse (HTTP on port 443, DNS tunneling)

Encrypted Traffic Analysis (ETA)

NDR platforms analyze TLS traffic without decryption by examining observable metadata: certificate chains, cipher suites, handshake timing, flow sizes, and packet inter-arrival times. JA3/JA4 hashing identifies client software from TLS Client Hello parameters alone.

JA4 (2023): more stable than JA3; resistant to randomization evasion
Certificate analysis: self-signed, short validity, unusual SAN fields
Flow statistics: C2 beaconing has regular timing and consistent byte counts
ALPN field: identifies application protocol negotiated within TLS

Lateral Movement Detection

Lateral movement — attackers pivoting from an initial foothold to additional systems — is the phase where NDR provides the most unique value. Network-level activity like port scanning, SMB enumeration, and pass-the-hash are visible in traffic even when endpoint agents are absent.

Unusual SMB/RPC connections between workstations
Internal port scanning patterns (rapid sequential connection attempts)
Kerberoasting: anomalous Kerberos TGS requests for service accounts
DCSync: unusual replication requests to domain controllers
Mimikatz patterns: LSASS access indicators in network-adjacent traffic

⚙️ Network Monitoring Best Practices

Visibility Without Analysis Is Just Noise

Organizations frequently invest heavily in data collection infrastructure — full packet capture, SIEM, flow collection — but underinvest in the analyst capability and detection logic needed to turn that data into actionable intelligence. A terabyte of unanalyzed PCAP provides no security value. Invest equally in collection and correlation.

Full Packet Capture Strategy

Full PCAP is invaluable for incident investigation but expensive to store at scale. A tiered strategy captures everything at chokepoints for a short retention window, with selective long-term storage for high-risk segments.

Internet edge: 24–72 hour rolling PCAP (highest value, high volume)
DMZ: 7–14 day rolling PCAP (critical for web server forensics)
Data center east-west: selectively capture sensitive server segments
Tools: Zeek + PCAP, Arkime (Moloch), Security Onion (integrated platform)

TAP vs. SPAN for Capture

Network TAPs (Test Access Points) provide a passive hardware copy of traffic — they do not affect the network path and cannot be disabled by software. SPAN (Switched Port Analyzer) ports are configured in switch software — simpler to deploy but can drop packets under load and can be inadvertently modified.

TAP: passive, undetectable, no packet loss, preferred for critical segments
SPAN: free (switch feature), flexible, subject to port oversubscription drops
Aggregation TAP: combine multiple links onto one monitoring interface
Virtual TAP: for VM/container environments (vSphere dvSwitch mirror port)

Retention Policies for Compliance

Log retention requirements vary by compliance framework. Retention policies must balance regulatory requirements against storage cost and query performance. Tiered storage (hot/warm/cold) manages cost while meeting retention mandates.

PCI DSS: 12 months log retention; 3 months immediately available
HIPAA: 6 years audit logs; 3 months readily accessible
SOC 2: varies by auditor; typically 12 months minimum
NIS2 (EU): 3 months for security events; 12 months for significant incidents
Immutable storage: WORM or object storage with object lock for tamper prevention

Threat Hunting with Network Data

Proactive threat hunting uses network data to search for attacker activity that has evaded automated detection. Hunters form hypotheses based on threat intelligence and use flow data, DNS logs, and PCAP to test them — finding evidence of compromise before an alert fires.

Hunt hypothesis: "APT group X uses HTTPS C2 with 5-minute beaconing intervals"
Search: flow data for regular-interval connections to external IPs
Pivot: examine TLS certificates, domain registration, ASN history
Hunt infrastructure: Splunk, Elastic, or specialized tools (Gravwell)