Writing up a production outage report today, I was making the case that workloads should fail their readiness probes when they can’t reach their dependencies — databases, caches, anything required to do useful work. The collaborating Claude session put it better than I had:
A probe that doesn’t fail when its workload can’t reach its database isn’t a probe — it’s wallpaper.
That’s the whole thing. A readiness probe answers one question: is this pod ready to serve traffic? If the answer depends on a database connection and you’re not checking for that, you’re not answering the question — you’re decorating the pod spec.
Why the shallow probe fails you during an outage
When a required dependency goes down, Kubernetes keeps routing traffic to your pods if their probes stay green. Every request hits a pod that can’t do the work and the user sees the error. You’re now relying on application-layer retries or circuit breakers to save you — neither of which you should need if the probe was doing its job.
A probe that reflects real dependency health lets Kubernetes do what it’s designed for: stop routing traffic to pods that aren’t ready. When the dependency recovers, the probe passes again, traffic resumes, no manual intervention.
What the difference looks like
Shallow probe — passes as long as the HTTP server is up:
readinessProbe:
httpGet:
path: /healthz
port: 8080Probe that actually answers the question — /ready checks the database connection:
readinessProbe:
httpGet:
path: /ready # verifies DB connection, returns non-2xx if unreachable
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3The specific timing values matter less than the endpoint — tune those to your startup time. The /ready handler should try to verify every dependency the pod needs to serve a request. Any of them unreachable, return a non-2xx. That’s it.
The wallpaper version looks like a probe, passes CI, and appears in every health check dashboard. It just doesn’t help when things go wrong — which is the only time probes matter.