The Paradox of the Idle Machine
Most engineers optimize for the servers doing obvious work: the database handling thousands of queries per second, the application servers peeling through API requests, the GPU clusters grinding through model inference. These machines show up in cost reports. They show up in performance dashboards. They get attention.
The servers that quietly route traffic, check whether other machines are alive, and hold configuration state tend to sit near zero CPU utilization. They don’t appear in capacity planning discussions because they barely show up on graphs. Engineers who inherit unfamiliar infrastructure sometimes wonder whether they can be retired.
They cannot. They are often the single point on which everything else depends.
What “Doing Nothing” Actually Means
A load balancer running at 2% CPU is not idle. It is making routing decisions on every inbound request, tracking the health of backend servers, terminating TLS connections, and maintaining session state. The reason it looks quiet is that it is very good at its job. HAProxy, for instance, is designed to handle hundreds of thousands of connections per second on modest hardware without breaking a sweat. The low utilization is the proof of correct design, not evidence that the machine is unnecessary.
The same logic applies to configuration servers. Systems like etcd, which underpins Kubernetes, store small amounts of data and handle relatively few reads. But those reads are synchronization points for entire clusters. When etcd becomes unavailable, Kubernetes cannot schedule pods, reconcile desired state, or respond to failures. The cluster doesn’t degrade gracefully. It freezes.
Health checkers are even more invisible until they fail. A monitoring agent pinging your services every 30 seconds burns almost no resources. But it is the mechanism that triggers failover, pages an engineer at 2 AM, and tells your load balancer to stop sending traffic to a backend that has gone down. Remove it, and your system loses its ability to detect its own failures. It will keep routing requests to dead servers until a human notices complaints.
The Problem with Low-Utilization Servers in Cost Reviews
Cloud cost optimization has become a serious discipline. Tools like AWS Cost Explorer, Datadog’s cloud cost management features, and third-party platforms like Infracost exist precisely to help companies find waste. The logic is sound: idle resources are money burned.
The risk is applying that logic without understanding what “idle” means in context. A server sitting at 3% average CPU could be waste. It could also be a bastion host that controls SSH access to your entire production environment, or a VPN endpoint that routes traffic for your engineering team. The CPU number tells you nothing about criticality.
This conflation has caused real outages. The pattern is familiar to operations teams: a cost-cutting initiative targets underutilized instances, someone decommissions a low-utilization box without fully tracing its dependencies, and a downstream system quietly breaks. The breakage often doesn’t surface immediately, which makes the root cause harder to find. By the time the incident postmortem happens, the connection between the deleted server and the failure requires careful archaeology to reconstruct.
The fundamental issue is that utilization is a proxy for value when the real measure is blast radius. A server that processes very little but whose failure would cascade through ten other systems is not a low-value server. It’s a high-risk one.
Dependency Mapping Is the Missing Step
The right way to evaluate whether a server can be decommissioned is not to look at its CPU chart. It’s to ask what would break if it disappeared at 3 PM on a Tuesday.
This is harder than it sounds. Large organizations often have poor visibility into their own dependency graphs. Services have accumulated over years, documentation is out of date, and institutional knowledge about why something exists frequently leaves when the engineer who built it does. A server can be load-bearing in ways that aren’t obvious from its resource metrics or even its name.
A few practices help. Chaos engineering, as practiced at companies like Netflix with their Simian Army tools, builds dependency knowledge by intentionally introducing failures in controlled conditions. You learn what actually matters by breaking things before production breaks them for you. This is expensive to set up but generates accurate dependency maps that no amount of documentation can match.
Less dramatic is systematic dependency documentation at the infrastructure level. Tools like Backstage (originally built at Spotify and later open-sourced) are designed partly to solve this problem by giving engineering teams a service catalog that maps ownership and dependencies explicitly. The overhead of maintaining it is real, but it’s also the overhead of knowing what you actually have.
Giving Silent Servers Their Due
The business argument for paying attention to low-utilization critical infrastructure is straightforward: downtime is expensive, and the machines most likely to cause cascading downtime are the ones nobody is watching.
High-utilization servers get watched. They appear in SLO dashboards, they show up when capacity planners run their models, and they get upgraded during infrastructure reviews. The border router sitting at 1% utilization, the internal DNS resolver, the certificate authority server that nobody has touched in two years: these are the machines that fail quietly and expensively.
Criticality and utilization are independent variables. Treating them as correlated is a category error that shows up in outage postmortems with uncomfortable frequency. The fix is not complicated. It requires taking the time to ask, for every low-utilization server you’re considering cutting: what is the blast radius if this disappears? The answer to that question, not the CPU chart, is what determines whether the machine is worth its cost.