SEV-1 · 9 min read

Postmortem: rotating a service-mesh CA without dropping a connection

This is a dummy post. Replace with a real write-up — the structure below is the template every SEV-1 (deep dive) entry follows.

Impact

None — and that was the whole point. This is a postmortem for an incident that didn't happen: rotating the certificate authority underneath a production service mesh carrying banking traffic, with zero dropped connections.

Background

Our Istio mesh anchors workload identity to AWS Private CA. Certificates are short-lived and rotate automatically — but the CA itself has a lifetime too, and "the CA expires" is not an alert you want to meet unprepared.

What we did

  1. Staged a new intermediate alongside the old one, so istiod could issue from either.
  2. Distributed the combined trust bundle first. Every workload must trust both roots before any workload presents a cert from the new one. This ordering is the entire game.
  3. Flipped issuance to the new intermediate and watched cert age drain down as workloads renewed naturally.
  4. Removed the old root from the bundle only after the last old-issued cert expired — verified from SPIFFE identities in access logs, not from hope.
# The dashboard that mattered: how many live certs still chain to the old root
istioctl proxy-config secret $POD -o json \
  | jq -r '.dynamicActiveSecrets[].secret.tlsCertificate.certificateChain.inlineBytes' \
  | base64 -d | openssl x509 -noout -issuer

What we learned

Action items