The Fastest Databases Lose Data on Purpose

The Write That Never Happened

When Redis processes a write operation, the data lands in memory. That’s it. By default, Redis does not immediately confirm that your data has been saved to disk. If the server loses power in the next few seconds, that write is gone, as if it never happened. Redis will tell you the operation succeeded. The disk will tell a different story.

This is not a bug. It is the explicit, documented, deliberately chosen behavior of one of the most widely deployed data stores on the planet. And the engineers who chose it were right.

The story of fast databases is really a story about the hidden cost of certainty, and why the most performance-sensitive systems in the world have concluded that knowing something is permanently saved is a luxury they cannot always afford.

What Durability Actually Costs

Durability, in database terminology, is the D in ACID. It means that once a database confirms a write, that data survives crashes. The mechanism for achieving this is almost always the same: you write to a durable storage medium before you acknowledge success.

The problem is that disks, even fast NVMe SSDs, impose a fundamental physics tax. A single fsync call, which forces the operating system to flush write buffers to durable storage, can take anywhere from a fraction of a millisecond to several milliseconds depending on hardware. For a database processing thousands of transactions per second, that wait is catastrophic.

PostgreSQL and MySQL, both ACID-compliant by default, perform this flush on every committed transaction. This is why a carefully benchmarked PostgreSQL instance on commodity hardware might sustain several thousand transactions per second under write load, while Redis, operating purely in memory, can handle hundreds of thousands of operations per second on the same machine. The gap is not about code quality or algorithmic cleverness. It is almost entirely about the cost of durability.

The engineers who built these systems understood something that is easy to miss: durability is not a binary property. It exists on a spectrum, and the right point on that spectrum depends entirely on what you are storing and what losing it actually means.

Diagram illustrating how a write-ahead log separates the acknowledgment of a write from the slower process of updating data on disk — The write-ahead log's core insight: sequential disk writes are fast enough to acknowledge quickly, even when random page updates are not.

The Spectrum Nobody Talks About

Most discussions of database durability collapse into a false binary, durable or not durable, ACID or eventually consistent. The reality is more granular and more interesting.

Redis, for instance, offers three durability modes. In its default configuration, it persists data to disk periodically, meaning you might lose the last few seconds of writes on a crash. Configure it with AOF (Append Only File) logging set to everysec, and you lose at most one second of data. Set AOF to always, and every write is flushed to disk before acknowledgment, and your performance drops dramatically. The same software, three different positions on the tradeoff curve.

Kafka, the distributed messaging system used by many large-scale data pipelines, makes similar distinctions at the producer level. You can configure a producer to consider a write successful when the data is accepted by the leader broker alone, when a subset of replicas have acknowledged it, or when all replicas have confirmed it. Each setting trades durability for latency.

Cassandra, designed for massive write throughput, defaults to a consistency level that can serve reads and acknowledge writes even when some nodes are unavailable. The CAP theorem, formalized by Eric Brewer in 2000, puts a name to this constraint: a distributed system can guarantee consistency and availability simultaneously only when there are no network partitions, and network partitions always happen eventually. Cassandra, by design, chooses availability over strict consistency when forced to pick.

The insight that makes this useful is not which choice is correct. It is that different data has different stakes.

When Losing Data Is Acceptable

Consider what Redis is most commonly used for: session data, caches, rate limiting counters, leaderboards, real-time analytics. Think about what happens if you lose a few seconds of that data in a crash. A user gets logged out and has to sign in again. A cache misses and falls back to the database. A leaderboard is momentarily stale. A rate limiter briefly forgets how many requests someone made.

None of these outcomes are good. Some are mildly annoying. None are catastrophic. The cost of a crash is bounded and recoverable. Against that, you get response times measured in microseconds and throughput that no disk-bound database can approach.

Contrast that with a financial ledger. If a bank’s database loses a committed transaction because it crashed before flushing to disk, that is not a minor inconvenience. It is a regulatory violation, a potential financial loss, and a fundamental breach of the contract the bank has with its customers. No performance gain justifies that outcome. PostgreSQL’s strict durability guarantees exist precisely because some data cannot be lost at any price.

This is why the engineers who built these systems made their choices deliberately. Memcached, which predates Redis and inspired parts of its design, makes its position explicit in its name: it is a memory cache. It has never pretended to be a durable store. The right question is never “is this database durable?” but “is this database durable enough for this specific use case?”

The Write-Ahead Log and Why It Changed Everything

For systems that need durability but cannot afford the naïve approach of writing data to its final location before acknowledging a transaction, the write-ahead log (WAL) is the central invention.

The insight behind WAL is that sequential writes to disk are dramatically faster than random writes. Instead of updating data in place and then fsyncing the entire modified page, you write a compact record of the intended change to a log that is always appended to sequentially. You fsync the log, not the data file. Then you acknowledge the transaction. The actual data pages get updated lazily in the background.

PostgreSQL has used WAL since version 7.1, released in 2001. SQLite, the most widely deployed database engine in the world (it runs on every iPhone and Android device), added WAL mode in 2010. The design allows SQLite to support concurrent reads during writes, and it improves write throughput significantly over the default journal mode.

WAL is a compromise position. You still pay the fsync cost, but only for small, sequential log entries rather than large, random data pages. It is slower than pure memory operations. It is dramatically faster than naive disk writes. And it maintains full durability guarantees.

The distributed systems world has its own version of this thinking. DynamoDB, Amazon’s managed NoSQL database, replicates writes to multiple availability zones before acknowledging success. The fsync problem is partially sidestepped by treating replication as a form of durability: if three copies of your data exist on three independent machines, the probability that all three fail simultaneously is low enough to be practically irrelevant for most workloads.

The Hidden Assumption in “Eventual Consistency”

Eventual consistency, the model used by Cassandra, DynamoDB in some configurations, and many other distributed systems, is frequently mischaracterized as “sometimes wrong.” The more precise description is “temporarily inconsistent but convergent.”

If you write a value to an eventually consistent store and then immediately read it from a different node, you might get the old value. But if you wait long enough, and no further writes happen, all nodes will converge on the same value. The system is not broken. It is optimizing for availability and partition tolerance at the cost of immediate consistency.

For the right workloads, this is entirely reasonable. A social media post’s like count does not need to be exact at every moment. An e-commerce product’s view count does not need to be perfectly synchronized across every datacenter. The user experience degrades negligibly when these numbers are off by a few for a few hundred milliseconds.

Where eventual consistency becomes genuinely dangerous is when engineers apply it to data that actually requires strict consistency, inventory counts in a flash sale, financial balances, unique constraint enforcement, and then discover the failure mode under load. This is not a problem with the database model. It is a problem with the application of it. The race conditions that emerge from this kind of mismatch are among the hardest bugs to reproduce and diagnose in production.

What This Means

The fastest databases lose data deliberately because the engineers who built them correctly identified that not all data has the same value, and durability has a real, measurable cost that should only be paid when the stakes justify it.

The practical implications for anyone building systems:

Match durability to stakes. Session caches and ephemeral counters belong in Redis with relaxed persistence settings. Financial records and inventory state belong in a system with full ACID guarantees. The choice should be deliberate, not a default.

Understand your failure modes before you need them. “Eventual consistency” is not a feature to adopt because it sounds scalable. It is a specific guarantee with specific failure modes. Know what happens to your application when two nodes disagree, because they will.

The fsync is not your enemy. For data that genuinely needs durability, paying the cost is correct. The mistake is paying it for data that doesn’t need it, or skipping it for data that does.

Replication is not the same as durability. Three in-memory copies of data across three servers are faster to lose than one copy on disk. Replication addresses availability and read throughput. Durability addresses what survives a crash. They overlap, but they are not equivalent.

The deepest lesson here is one that applies well beyond databases. Every guarantee in a computing system has a cost. The engineers who built the systems we rely on made hard choices about which guarantees were worth paying for. Understanding those choices, rather than accepting defaults, is what separates systems that work from systems that work until they don’t.

The Fastest Databases Lose Data on Purpose

The Write That Never Happened

What Durability Actually Costs

The Spectrum Nobody Talks About

When Losing Data Is Acceptable

The Write-Ahead Log and Why It Changed Everything

The Hidden Assumption in “Eventual Consistency”

What This Means

You might also like

Deleting a Feature Is the Hardest Call in Engineering

Deleting a Database Column Is Riskier Than It Looks

Interpreted Languages Are Fast Enough. Compiled Ones Aren't Magic.

The Bugs That Never Get Filed Are the Worst Ones

Why Deleting a Database Column Is Surprisingly Risky

Your Program Spends Most of Its Time Waiting

Stay ahead of the curve.

The Write That Never Happened

What Durability Actually Costs

The Spectrum Nobody Talks About

When Losing Data Is Acceptable

The Write-Ahead Log and Why It Changed Everything

The Hidden Assumption in “Eventual Consistency”

What This Means

Don't miss the signal.

You might also like

Deleting a Feature Is the Hardest Call in Engineering

Deleting a Database Column Is Riskier Than It Looks

Interpreted Languages Are Fast Enough. Compiled Ones Aren't Magic.

The Bugs That Never Get Filed Are the Worst Ones

Why Deleting a Database Column Is Surprisingly Risky

Your Program Spends Most of Its Time Waiting

Stay ahead of the curve.