1. The Secrets Sprawl Problem
What Are Secrets?
Secrets are any credential or key that grants privileged access to systems, data, or services. Unlike user credentials, secrets are often shared by systems and processes, and their compromise is frequently silent — no failed login, no alert, just quiet unauthorized access.
- API Keys: Stripe payment keys, Twilio SMS credentials, SendGrid email tokens, third-party service access tokens
- Database Credentials: MySQL/PostgreSQL passwords, MongoDB connection strings, Redis AUTH tokens — direct access to all application data
- TLS/SSL Private Keys: Compromise enables man-in-the-middle attacks against encrypted traffic to your systems
- OAuth Tokens & Client Secrets: Application-to-application authentication credentials for OAuth 2.0 flows
- SSH Private Keys: Long-lived server access credentials — a leaked private key with broad server access is equivalent to a skeleton key
- JWT Signing Keys: If leaked, enables forging of any JWT token your application will accept — full authentication bypass
- Cloud Provider Credentials: AWS access key/secret, GCP service account JSON, Azure client secret — unlimited access to cloud resources
- Encryption Keys (KEKs/DEKs): Keys used to encrypt other keys or sensitive data — the master secret that unlocks everything else
Where Secrets End Up
Secrets spread throughout an organization's infrastructure in predictable ways, each representing a distinct attack surface and detection/remediation challenge.
- Source Code Repositories: GitGuardian detected 12.7 million secrets in public GitHub repositories in 2023 — up 28% YoY. Private repos are not immune; leaked employees take git access with them.
- CI/CD Environment Variables: Pipeline secrets in Jenkins, GitHub Actions, GitLab CI — often broadly readable by all repository contributors, logged in build output
- Container Image Layers: Secrets baked into Docker layers during build persist even if removed in a later RUN command —
docker historyreveals them. Container registry can be queried by anyone with pull access. - Configuration Files: .env files, application.properties, appsettings.json committed to repositories. Developers forget which files are gitignored.
- Chat & Ticketing Systems: Slack messages, Jira tickets, Confluence pages — secrets shared for debugging and never removed. Discovery requires searching collaboration platforms.
- Log Files: Applications logging request parameters or error contexts that include credentials. Log aggregation systems then retain and expose them.
| Where Secrets End Up | Risk Level | Detection Method | Remediation |
|---|---|---|---|
| Public Git repository | Critical | GitHub Secret Scanning, GitGuardian, truffleHog | Revoke immediately, rotate, git history scrub |
| Private Git repository | High | Pre-commit hooks, CI scanning, gitleaks | Revoke, rotate, git filter-repo to remove history |
| Container image layers | High | Trivy, Snyk Container, docker inspect | Rebuild image, rotate secret, push new image |
| CI/CD env variables | High | Audit pipeline variable permissions, log scanning | Migrate to vault/OIDC, rotate existing values |
| Config files on servers | Medium | File system scanning, configuration audit | Migrate to env vars or vault, secure file permissions |
| Slack / Confluence | Medium | Nightfall AI, search for key patterns | Delete messages/pages, revoke and rotate secret |
2. Secrets Vaults
HashiCorp Vault
HashiCorp Vault is the most feature-rich secrets management platform, popular in cloud-native and DevOps environments. Its dynamic secrets capability fundamentally changes how applications access credentials.
- KV Secrets Engine: Key-value store for static secrets. Version 2 provides secret versioning and point-in-time recovery.
- Dynamic Secrets: Vault generates credentials on demand and revokes them automatically when the lease expires. Database engine, AWS engine, PKI engine — never store long-lived credentials.
- Database Engine: Vault creates temporary DB users with a configurable TTL (e.g., 1 hour). Leaked credential is useless after TTL. Dramatically reduces blast radius of credential theft.
- PKI Engine: Vault acts as a private CA. Issues short-lived TLS certificates (e.g., 24-hour) that auto-rotate. Eliminates certificate management overhead and long-lived cert risk.
- Vault Agent: Sidecar process that authenticates to Vault, fetches secrets, and writes them to a local tmpfs or environment — application doesn't need Vault SDK integration.
- Audit Log: Every Vault operation — read, write, authentication — is logged. Enables detection of unauthorized secret access.
AWS Secrets Manager & SSM
AWS-native secrets management for organizations running primarily on AWS. Deep integration with RDS, Lambda, ECS, and IAM makes it the natural choice for AWS workloads.
- AWS Secrets Manager: Stores and rotates secrets. Built-in rotation for RDS/Aurora/Redshift via Lambda — no manual password rotation needed. Cross-account access via resource policies.
- Automatic Rotation: Lambda function changes the database password and updates the secret on a schedule. Zero-downtime rotation for supported databases with connection pool awareness.
- Resource Policy: Secrets can be accessed cross-account without sharing credentials — ideal for multi-account architectures. Combine with IAM policies for defense-in-depth.
- AWS SSM Parameter Store: Cheaper and simpler than Secrets Manager. SecureString parameters encrypted with KMS. No automatic rotation. Better for configuration values and non-critical secrets. Free tier for standard parameters.
- Use Secrets Manager for anything requiring rotation; SSM Parameter Store for static configuration. Both integrate natively with ECS Task Definitions, Lambda environment, and EC2 Instance Connect.
| Solution | Dynamic Secrets | Auto-Rotation | Cloud Native | Self-Hosted | Cost Tier |
|---|---|---|---|---|---|
| HashiCorp Vault OSS | Yes (DB, AWS, PKI, SSH) | Yes (lease-based) | Partial (HCP Vault available) | Yes | Free + infra cost |
| HashiCorp Vault Enterprise | Yes (full) | Yes | Yes (HCP) | Yes | Enterprise ($$$) |
| AWS Secrets Manager | No | Yes (RDS, custom Lambda) | Yes (AWS) | No | $0.40/secret/month |
| AWS SSM Parameter Store | No | No | Yes (AWS) | No | Free (standard) / low cost |
| Azure Key Vault | No | Yes (certificates, some secrets) | Yes (Azure) | No | Pay-per-operation |
| Infisical (open source) | No | Partial | Yes (Infisical Cloud) | Yes | Free OSS / SaaS tiers |
# HashiCorp Vault: Dynamic database credentials workflow (Python)
import hvac
import psycopg2
# Authenticate to Vault using AWS IAM auth (no static Vault token needed)
client = hvac.Client(url='https://vault.internal:8200')
client.auth.aws.iam_login(role='my-app-role')
# Request dynamic PostgreSQL credentials
# Vault generates a new DB user with 1-hour TTL
db_creds = client.secrets.database.generate_credentials(name='my-postgres-role')
username = db_creds['data']['username'] # e.g., v-app-xyz123
password = db_creds['data']['password']
lease_id = db_creds['lease_id'] # used to renew or revoke
# Connect using the dynamic credentials
conn = psycopg2.connect(
host='postgres.internal',
database='appdb',
user=username,
password=password
)
# When done: revoke the lease immediately (don't wait for TTL)
client.sys.revoke_lease(lease_id=lease_id)
# The DB user no longer exists after revocation
# Any attacker capturing these credentials has a ~1-hour window maximum
3. CI/CD Secrets Management
OIDC Federation: No More Long-Lived Keys
OpenID Connect (OIDC) federation allows CI/CD systems to authenticate to cloud providers using short-lived identity tokens rather than long-lived API keys. This eliminates the most dangerous form of secrets sprawl.
- How it works: GitHub Actions (or GitLab CI, CircleCI) requests a signed OIDC JWT from the identity provider. The cloud provider (AWS, Azure, GCP) trusts this JWT and issues a temporary credential for the specific workflow run.
- GitHub Actions → AWS: Configure AWS IAM Identity Provider with GitHub as trusted OIDC provider. Create IAM role with conditions on repository and branch. Workflow uses
aws-actions/configure-aws-credentials@v4— no secrets stored anywhere. - GitHub Actions → Azure: Federated Identity Credential on Azure AD App Registration. Workflow authenticates with
azure/login@v2using client-id, tenant-id, subscription-id — no client secret needed. - GitHub Actions → GCP: Workload Identity Federation. Workflow authenticates via
google-github-actions/auth@v2with service account impersonation — no key files downloaded or stored. - Credentials are scoped to the specific workflow execution — they expire when the job ends, cannot be reused, and have no value if captured from logs
Platform-Specific Secrets
For secrets that cannot use OIDC federation (third-party APIs, internal services), CI/CD platforms provide encrypted secret stores with access controls.
- GitHub Actions: Repository secrets (all branches) and Environment secrets (branch-protected deployment environments). Secrets are masked in logs. Organization secrets can be shared across repos with allow-lists.
- GitLab CI: Masked and protected variables. External secrets integration with HashiCorp Vault via JWT auth — pipelines authenticate to Vault with their pipeline identity, never storing Vault tokens.
- Jenkins: Credentials plugin stores secrets in encrypted form. Vault plugin enables Jenkins agents to authenticate to Vault using AppRole and retrieve secrets at job runtime.
- Kubernetes — External Secrets Operator: Kubernetes operator that syncs secrets from Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager into native Kubernetes Secrets. Applications use native K8s secrets; the source of truth is the external vault.
- EKS IRSA: IAM Roles for Service Accounts — pods authenticate to AWS services using a projected service account token, no static AWS credentials in pods
Use OIDC Federation for All Major Cloud Providers
OIDC federated identity tokens completely eliminate the need for long-lived API keys in CI/CD pipelines for AWS, Azure, and GCP workflows. A long-lived AWS access key stored as a GitHub repository secret is accessible to every contributor, maintainer, and anyone who has ever had access to that repository. An OIDC-configured IAM role grants credentials that expire in 15 minutes and only to workflows running on the exact branch with the exact repository condition. If you have any CI/CD pipelines using static cloud provider credentials, migrating to OIDC federation is a one-day project that eliminates your highest-risk secrets entirely.
4. Certificate & PKI Management
Certificate Lifecycle Management
X.509 certificates have a complete lifecycle that, if unmanaged, results in either security incidents (compromised certificates not revoked) or outages (expired certificates).
- Generation: Private key and CSR (Certificate Signing Request) generated. Private key must be protected — HSM for high-value certificates, at minimum encrypted storage.
- Signing: CSR submitted to CA (Certificate Authority) — public (Let's Encrypt, DigiCert) for public-facing services, internal CA for internal systems.
- Distribution: Signed certificate deployed to servers, load balancers, and applications. Must be tracked: what certificate is deployed where.
- Renewal: Certificate must be renewed before expiration. Manual renewal is the primary cause of certificate-related outages. Automate with ACME protocol or cert-manager.
- Revocation: If private key is compromised, certificate must be revoked via CRL (Certificate Revocation List) or OCSP (Online Certificate Status Protocol). Revocation is often neglected.
- Certificate inventory: you cannot renew or monitor what you don't know exists. Tools like Venafi, Keyfactor, or even a simple spreadsheet maintained rigorously reduce outage risk.
Automated Certificate Issuance
Manual certificate management doesn't scale and is error-prone. Automation via ACME protocol and platform tools eliminates the human bottleneck.
- ACME Protocol: Automated Certificate Management Environment. Let's Encrypt and ZeroSSL use ACME to issue DV (domain-validated) certificates automatically, with renewal handled by the client.
- cert-manager (Kubernetes): Kubernetes operator that automatically issues and renews TLS certificates from Let's Encrypt, Vault PKI, or other ACME-compatible CAs. Certificates are rotated before expiry automatically.
- HashiCorp Vault PKI Engine: Internal CA that issues certificates on demand with configurable TTLs. Short-lived certificates (24-hour) for internal mTLS — rotation is automatic and continuous.
- Caddy Web Server: Obtains and renews ACME certificates automatically with zero configuration. Caddy was the first web server to make automatic HTTPS the default.
- Short-Lived Certificates: 24-hour internal certs (vs 1-year public certs) dramatically reduce the window of exposure from key compromise. Forced rotation also ensures automation is always working.
Certificate Expiry Monitoring
Certificate expiry is one of the most preventable causes of production outages. An expired certificate causes immediate service disruption and is entirely avoidable with monitoring.
- Prometheus + Blackbox Exporter: Probe endpoints and expose certificate expiry as a metric. Alert at 30 days, 14 days, and 7 days before expiry.
- Datadog / New Relic: Built-in TLS certificate monitoring with alerting on expiry thresholds
- SSL Labs API / testssl.sh: Scan all external endpoints for certificate validity, chain issues, and weak cipher suites on a schedule
- Cert Inventory Audits: Quarterly review of all certificates in the environment — public-facing, internal, client certificates, code signing. Flag anything expiring within 90 days for immediate renewal.
- The most common root cause of certificate outages: the certificate was on a server no one actively managed, or the renewal reminder went to a former employee's email
5. Secrets Detection & Remediation
Pre-commit Detection Tools
The best time to catch a leaked secret is before it ever reaches a repository. Pre-commit hooks run locally during git commit and reject the commit if secrets are detected.
- gitleaks: Fast, TOML-configurable secrets scanner. Detects 150+ secret types. Can run as pre-commit hook, in CI, and as full repo scanner. Regex-based with entropy analysis for generic keys.
- detect-secrets (Yelp): Generates a .secrets.baseline file of known false positives. Subsequent runs only alert on new secrets. Good for teams with many legitimate high-entropy strings.
- truffleHog v3: Verifies detected secrets by actually calling the relevant API to confirm if the key is still active. Reduces false positive noise significantly. Also scans full git history.
- Pre-commit framework: use the
pre-commitPython tool to manage hooks across the team — hooks are defined in .pre-commit-config.yaml and installed withpre-commit install - Pre-commit hooks can be bypassed with
git commit --no-verify— complement with CI-level scanning that cannot be bypassed
Repository Scanning
Pre-commit hooks only catch new secrets going forward. Historical repository scanning finds secrets that were committed before detection was implemented.
- GitHub Secret Scanning: Native GitHub feature that scans repositories for known secret formats (200+ service providers). Push Protection blocks pushes containing recognized secrets before they reach the repository. Available for public repos free; private repos require GitHub Advanced Security license.
- GitGuardian: SaaS secrets detection for Git repositories, GitHub, GitLab, Bitbucket. Real-time scanning with developer notification and remediation guidance. Detects secrets in CI/CD configurations, not just source code.
- truffleHog full history scan:
trufflehog git https://github.com/org/repo --only-verifiedscans entire commit history and verifies each finding against the relevant service. - For large repositories with long history: parallel scanning using truffleHog's
--since-commitflag to progressively scan history sections - Scheduled quarterly full-history re-scans — new detection signatures catch secrets that previous scans missed
Incident Response for Leaked Secrets
When a secret is found in a repository — or suspected of exposure — the response must be immediate and systematic. Assume the secret is already in attacker hands.
- Step 1 — Revoke Immediately: Before analyzing impact, before notifying anyone, before removing it from git history — revoke the secret. Every second counts. If an attacker found it, they're using it now.
- Step 2 — Audit Usage Logs: Check the service's access logs (AWS CloudTrail, GitHub audit log, Stripe dashboard) for any usage of the compromised credential. Determine if unauthorized access occurred.
- Step 3 — Rotate Everywhere: Identify all places the rotated secret was deployed (servers, CI/CD, config files, other repositories) and update all of them with the new credential.
- Step 4 — Remove from Git History: Use
git filter-repo(preferred over BFG Repo Cleaner) to scrub the secret from all commits. Coordinate a force-push and notify all contributors to re-clone. - Step 5 — Post-Mortem: Why was the secret in code? What process failed? Implement pre-commit hooks, vault migration, and policy changes to prevent recurrence.
Assume Every Leaked Secret is Already Compromised
Treat every leaked secret as fully compromised from the moment it was accessible. Public repository secrets are indexed by automated scanners within minutes of publication — assume the attacker found it the moment it was committed. Private repository secrets are accessible to every contributor who has ever had access. The correct response is immediate revocation, not investigation first. After revoking, investigate. After investigating, rotate everywhere. After rotating, post-mortem to prevent the next occurrence. Removing the secret from git history and generating audit evidence should happen in parallel with revocation, not instead of it.