A Protocol Born on Napkins

In 1989, two engineers named Kirk Lougheed and Yakov Rekhter sketched out a routing protocol during lunch at an IETF meeting. They literally used napkins. The result, the Border Gateway Protocol, became the mechanism by which every autonomous network on the internet tells every other autonomous network how to reach it. BGP carries the routing information for roughly 900,000 distinct network prefixes today. It runs on routers inside every major ISP, cloud provider, and enterprise network on earth. And it was designed, explicitly, as a temporary fix.

Temporary fixes have a way of becoming permanent fixtures. BGP replaced a protocol called EGP that everyone agreed was inadequate. The replacement was meant to buy time while something better was designed. Something better was never designed. BGP is now so deeply embedded in internet infrastructure that the realistic cost of replacing it exceeds any conceivable benefit, which means we are all, perpetually, living inside that lunch break.

What BGP Actually Does

The internet is not one network. It is a collection of independently operated networks, called autonomous systems, each assigned a unique number by regional internet registries. Your ISP is an autonomous system. Google is several autonomous systems. AWS operates dozens. When you load a webpage, your request may cross six or eight different autonomous systems before reaching its destination, each one operated by a different organization with different equipment, different policies, and different financial arrangements with its neighbors.

BGP is the protocol that stitches these autonomous systems together. Each AS announces to its neighbors which IP address ranges it can reach. Those neighbors pass the information along, attaching their own network identifier as they go, until every AS on the internet has a routing table that says, in effect: to reach this block of addresses, send traffic through this sequence of networks.

The key word is “announces.” BGP is fundamentally a trust-based system. When an autonomous system says it can reach a particular block of IP addresses, other networks believe it. There is no cryptographic verification of this claim in the basic protocol. There is no central authority checking that the announcement is legitimate. There is no automatic mechanism for retracting false information quickly once it propagates.

This is not an oversight. In 1989, the internet was a small community of researchers and institutions who largely knew each other. Trust was reasonable. Scale changes everything.

Diagram showing BGP route announcement propagation across autonomous systems, with one misconfigured node highlighted in red still passing traffic downstream
A single misconfigured BGP announcement can propagate to thousands of networks in minutes. Corrections travel much slower.

When Trust Becomes a Vulnerability

The mechanics of BGP failure are grimly simple. If your network announces, whether by accident or malice, that it has a better path to some block of IP addresses than the legitimate owner, other networks will start routing traffic through you. This is called a BGP hijack, and it happens with uncomfortable regularity.

In 2010, China Telecom accidentally announced routes for roughly 37,000 networks for about 18 minutes. Traffic destined for networks belonging to the U.S. Senate, the U.S. Army, and major commercial networks briefly routed through China. In 2018, traffic heading to Amazon’s Route 53 DNS service was hijacked for about two hours through a BGP route leak, redirecting users trying to reach a cryptocurrency platform to a phishing site that drained wallets. In 2019, a small ISP in Pennsylvania accidentally caused a large portion of European mobile traffic to route through their network, creating cascading congestion.

These incidents share a common structure: a wrong announcement propagates faster than anyone can detect and respond to it. BGP was designed to converge, meaning to settle on a consistent routing state across the whole network. It converges slowly on good news and slower still on corrections. A false route can spread globally in minutes. Fixing it requires operators at multiple organizations to notice, communicate out-of-band, and manually correct their configurations. That process takes hours, sometimes longer.

The phrase “route leak” sounds minor. It is not. Route leaks have taken down major cloud providers for hours, interrupted financial trading, and redirected sensitive government communications. The vulnerability is not theoretical.

Why Nobody Has Replaced It

The obvious question is why a protocol with these characteristics still runs the internet’s core infrastructure. The answer is instructive about how technical debt accumulates at civilizational scale.

First, BGP works. Not elegantly, not securely, but functionally. It has scaled from the internet of 1989 to the internet of now, which carries many orders of magnitude more traffic across many orders of magnitude more networks. Engineers have patched, extended, and worked around its limitations for thirty-five years. The accumulated engineering knowledge of how to operate BGP safely and effectively is enormous.

Second, replacing BGP would require every autonomous system on the internet to coordinate a simultaneous migration to something new. The internet has no central authority that can mandate this. The organizations that operate autonomous systems have different incentives, different timelines, and different risk tolerances. Getting all of them to agree on a replacement protocol, implement it, test it, and cut over is not a technical problem. It is a coordination problem of staggering complexity.

Third, BGP carries with it a rich ecosystem of policy and commercial arrangement. The routes that BGP propagates encode relationships between networks: who pays whom, who peers with whom for free, whose traffic gets preferential treatment. Any replacement would need to replicate this policy expressiveness, which means it would inevitably end up resembling BGP.

This is the real trap. The protocol is not just technically entrenched. It is economically and contractually entrenched. The routing policies encoded in BGP configurations represent billions of dollars in negotiated commercial agreements.

The Security Patches That Arrived Decades Late

The internet engineering community has not been idle. Two security extensions to BGP have been developed and are slowly being deployed: Resource Public Key Infrastructure (RPKI) and BGPsec.

RPKI allows network operators to cryptographically sign records stating which autonomous systems are authorized to originate routes for which IP address blocks. Networks that validate these signatures can reject unauthorized route announcements. It is a meaningful improvement, though not a complete solution. As of recent measurements, RPKI-validated routes cover a majority of internet prefixes, but validation is not universally enforced. A route can be signed and still be accepted by networks that don’t check signatures.

BGPsec goes further, providing cryptographic validation for the full path that a route announcement travels. It is also far more computationally expensive, requires hardware upgrades at scale, and has seen very limited deployment. The performance cost has made most operators reluctant to enable it.

The gap between these protocols existing and being universally enforced illustrates something important about internet security: knowing what to do and doing it at internet scale are entirely different problems. The organizations that most need to enforce these protections are often the smallest ISPs with the least resources to implement them.

The Cloud Providers Quietly Taking Control

While the standards bodies work on incremental improvements to BGP, the large cloud providers have taken a different approach: they have gotten big enough that they can largely route around the problem.

AWS, Google, and Microsoft each operate global private networks that span continents. When your traffic travels between AWS regions, or between Google services, much of it never touches the public internet at all. It travels across fiber that these companies own or lease, managed by routing protocols they control. The chaotic trust-based system of public BGP becomes a concern only at the edges, where these private networks hand off to the broader internet.

This has had an interesting structural effect. A significant and growing share of the world’s most sensitive internet traffic (financial transactions, cloud storage, enterprise communications) now travels primarily on networks owned by three or four companies, rather than across the federated public internet that BGP was designed to coordinate. The security problem of BGP hasn’t been solved. For a large portion of valuable traffic, it has been routed around by consolidation.

This consolidation creates its own risks, of course. The replication lag you ignore is causing data loss in distributed systems, and when those distributed systems all run on the same provider’s infrastructure, the failure domains become uncomfortably large. Concentration and fragility travel together.

What This Means

BGP is a useful lens for understanding how the internet actually works, which is to say: through accumulated compromise, enormous institutional inertia, and the continuous creative engineering of people managing systems they did not design and cannot fully replace.

The protocol’s persistence is not a scandal. It is a case study in the realistic constraints of large-scale infrastructure. Perfect is not available. Replacing working infrastructure with something better requires coordination that internet-scale systems rarely permit. So we patch, extend, and operate defensively.

The practical takeaway for anyone building on internet infrastructure is to treat BGP’s guarantees the way you would treat any unverified input: with skepticism. Route leaks and hijacks are not exotic attacks. They are routine occurrences with predictable consequences. Systems that depend on traffic reaching a specific path reliably, without fallback or monitoring, are systems that will eventually fail in ways their operators did not anticipate.

The napkin protocol is still running. The lunch break never ended. The best response is to know exactly what you’re depending on.