Performance optimization has a seductive logic: find the slow code, make it faster, ship the improvement. Profilers, benchmarks, caching layers, thread pools. The toolbox is deep and the craft is real. But the engineers who consistently build the fastest systems share a habit that precedes all of that. They ask, first, whether the work needs to happen at all.

This is not a philosophical point. It is a practical one with measurable consequences. Every cycle spent computing a result that was already computed, or that nobody needed, or that could have been avoided by a better design decision upstream, is a cycle that compounds across millions of requests. The following principles are ordered roughly by how early in the design process they should appear, which is roughly the inverse of when most teams actually consider them.

1. Caching Is Not the First Resort. It Is the Last.

Teams reach for caching early because it feels like a technical solution to a technical problem. Slow database query? Cache the result. Expensive API call? Cache the response. The fix is fast to implement and the benchmark numbers improve immediately.

The problem is that caching is a patch over a question you haven’t asked: why is this being computed repeatedly? Sometimes the answer is legitimate. Read-heavy workloads with stable data are textbook caching candidates. But often the answer reveals something more uncomfortable: the data model is wrong, the query is hitting the wrong table, or the same work is being done in three places because nobody mapped the call graph. Caching in those cases doesn’t solve the problem. It makes the problem invisible and harder to find later.

Cache invalidation is famously difficult for a reason. Every cache you add is a consistency contract you now have to honor across every future change. The fastest cache is the one you never needed.

2. Precomputation Only Wins If You Guess Right

The mirror image of caching is precomputation: calculating results before they are requested. Done well, this is genuinely powerful. Static site generators, search index builds, recommendation batch jobs. The work happens once; the serving is trivial.

Done carelessly, precomputation is computation theater. You pay the full cost upfront for results that may never be requested, may be stale by the time they are, or may cover the wrong distribution of user behavior entirely. Many teams have built elaborate offline pipelines generating personalized outputs for users who churned months ago.

The question to ask is not “can we precompute this” but “how confident are we about what will be requested, and how tolerant are we of staleness.” If the answer to either is “not very,” the precomputation is probably burning resources to avoid a problem that better on-demand design would handle more cleanly.

Iceberg diagram representing the hidden computational cost of unexamined code paths below the surface
Profilers show you the tip. The bulk of avoidable work sits below the waterline.

3. The Most Expensive Request Is the Redundant One

In distributed systems, the same data gets fetched by the same service multiple times within the same request cycle more often than anyone wants to admit. Two downstream calls that both need the current user’s account status. Three components that each independently check a feature flag. The aggregate cost is real, but no single piece of code looks obviously wrong.

The fix is rarely a performance optimization in the traditional sense. It is a design fix: a request-scoped context object, a single resolved dependency passed down the call stack, a clearer ownership model for who fetches what. These changes do not require profiling to discover. They require reading the code carefully and asking whether the same question is being asked more than once.

This is the category of optimization that disappears when you add logging and reappears the moment you remove it. The redundancy hides in the silence between log lines.

4. Lazy Evaluation Is a Form of Respect

Eager evaluation feels thorough. Load all the data you might need, populate all the fields, resolve all the dependencies. When something is requested, it is ready. The cost is that most of what you prepared is never used.

Lazy evaluation is the discipline of not doing work until it is actually demanded. In practice this means database queries that fetch only the columns a view will render, object graphs that do not resolve relationships until they are accessed, and initialization paths that defer expensive setup until the feature is first invoked. The result is a system that pays only for what it delivers.

The objection is usually complexity: lazy systems can be harder to reason about, and deferred failures are harder to debug than eager ones. This is true and worth taking seriously. But the answer is better abstractions, not abandoning the principle. Languages and frameworks that make laziness ergonomic, like Haskell’s core evaluation model or Python generators, demonstrate that the tradeoff is manageable with the right tools.

5. The Work You Decline to Do Is Permanently Free

Feature requests carry hidden computational costs that product teams rarely see. Every new filter option in a search UI is a potential query path to optimize. Every new notification type is a fan-out operation at scale. Every new dashboard widget is a database read on every page load. The performance cost is diffuse, spread across future engineering time and infrastructure spend, and almost never appears in the feature estimate.

The engineers who push back on low-value features are not being obstructionist. They are recognizing that declined work compounds in your favor the same way accepted work compounds against you. A feature that serves three percent of users and requires maintaining a complex aggregation pipeline forever is a bad deal that profiling will never surface, because profiling measures what runs, not what was never built.

This connects to something your to-do list shares with your codebase: both systems are optimized for intake, not for the more valuable discipline of rejection.

6. Correctness Constraints Are Performance Constraints in Disguise

Some of the most significant performance improvements come not from running the same computation faster but from proving that a weaker computation is sufficient. You do not need a consistent read if the data is append-only. You do not need a distributed lock if operations are idempotent. You do not need to revalidate a session token on every request if the token is signed and short-lived.

Each of these is a correctness analysis before it is a performance optimization. By examining what the code actually needs to guarantee, rather than what it defensively guarantees, you often discover that expensive machinery was protecting against failures that cannot occur in your specific context. The savings are not marginal. Eliminating a network round-trip entirely is an order-of-magnitude improvement over making that round-trip faster.

The common thread across all six of these principles is that they require thinking before measuring. Profilers are essential tools, but they can only show you the cost of what runs. The code that never runs, because you asked hard questions before writing it, does not appear in any flame graph. Its contribution to performance is invisible and permanent.