A/B Testing Was Supposed to Improve Products. It Mostly Learned to Manipulate You Instead.

A/B testing started as an honest idea. You show half your users one version of a feature and the other half a different version, measure which performs better, and ship the winner. It’s the scientific method applied to software. The problem is that ‘better’ requires a definition, and the definitions tech companies choose reveal exactly what they’re optimizing for.

The answer, almost always, is not your wellbeing.

What Gets Measured Gets Optimized

Every A/B test has a target metric. The team running the experiment picks something measurable, usually clicks, time on page, conversion rate, or subscription upgrades, and declares whichever variant moves that number the winner. This sounds reasonable until you notice what’s missing from that list: whether users actually got what they came for.

Facebook’s own internal research (exposed during the 2021 whistleblower disclosures) documented that engagement-optimizing algorithms were surfacing content that provoked anger and anxiety more effectively than content users reported finding valuable. The company knew this. The experiments kept running. Engagement went up. User satisfaction, by their own internal measures, went down. The metric won.

This is the fundamental trap of A/B testing at scale. Once you define success as a number, the system learns to hit the number, not to achieve the underlying goal the number was supposed to represent. Researchers call this Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.

Abstract illustration of user segmentation in personalized A/B testing — Personalized experimentation means no two users see the same product, making it nearly impossible to compare experiences or notice patterns collectively.

The Specific Tricks That Work

What does psychological manipulation look like in practice? It’s often subtler than dark patterns like hidden unsubscribe buttons (though those exist too). The more sophisticated version is interface design that exploits known cognitive biases, tested at scale until the most effective variant wins.

Anchoring is a reliable one. Show users a high price first, then a lower price, and they perceive the lower price as a deal even if it’s above market rate. Amazon has run variants of this for years with crossed-out “list prices” next to sale prices. Studies in behavioral economics consistently confirm the effect, and A/B testing lets companies find the exact price differential that maximizes conversion without triggering enough user skepticism to cause churn.

Scarcity cues work similarly. “Only 3 left in stock” and “12 people are viewing this right now” are not organic information, they’re tested interventions. Booking.com became particularly well-known for these tactics. The company runs thousands of simultaneous tests, and scarcity messaging variants that create urgency have repeatedly outperformed neutral variants in conversion rate. That’s not a coincidence of good design; it’s the output of a system that selects for psychological pressure.

Default settings are another high-leverage area. Users rarely change defaults, which makes the default position enormously powerful. Companies A/B test default states on everything from privacy settings to auto-renewal checkboxes. The variant that gets selected is the one that produces the outcome the company wants, which is rarely the outcome that maximizes user control. This is worth sitting with: the “default” experience you receive has been experimentally optimized to produce a specific behavior from you, usually a behavior that benefits the platform.

Personalization Makes It Worse

Basic A/B testing assigns users randomly to variants. Modern experimentation goes further by targeting specific user segments with the variants most likely to work on them. You’re not just shown a manipulative dark pattern; you’re shown the manipulative dark pattern that the model predicts will work on you specifically, based on your behavioral history.

This is where the practice of running experiments on users without meaningful consent becomes genuinely troubling. The 2014 Facebook emotional contagion study (published in PNAS, since criticized heavily on ethical grounds) demonstrated that the company could shift users’ emotional states by manipulating their feeds, and was willing to do so without disclosure. That study used 689,000 users as unwitting participants. The infrastructure to run that kind of experiment continuously, on hundreds of millions of users, already exists. It runs every day.

Personalized experimentation also creates a troubling epistemic problem for users. If you and a friend are each seeing algorithmically customized versions of the same product, your experiences are genuinely different. You can’t compare notes in any meaningful way. You can’t collectively notice that the checkout flow was designed to confuse, because your checkout flow might look different from mine. This fragmentation is not an accident; it’s an emergent property of personalized optimization, and it conveniently makes coordinated user resistance much harder.

There’s an important distinction between A/B testing that genuinely improves a product and A/B testing that identifies the most effective way to override a user’s stated preferences. The former is valuable. The latter is manipulation with a methodology. The industry has blurred this line deliberately.

The legal argument companies make is that A/B testing is covered under broad terms of service agreements that users accept. This is technically true and morally hollow. Nobody reads those agreements, nobody understands that accepting them means consenting to continuous behavioral experimentation, and “you clicked agree” is not a meaningful standard for informed consent when the agreement is 40 pages of legalese. The EU’s GDPR has pushed back on this in limited ways, particularly around consent for personalization, but enforcement is uneven and companies operating primarily in U.S. markets face essentially no constraints.

The honest version of A/B testing would involve telling users what’s being tested, why, and what metrics define success. Some companies do publish aggregate results from experiments, particularly when the outcomes are flattering. None of them publish the full ledger, including the variants that were tried, the psychological mechanisms they targeted, and the tests that were abandoned because they produced user complaints rather than revenue.

What You Can Actually Do

The standard advice here would be to read privacy policies and adjust settings. That’s not wrong, but it underestimates the problem. The most effective manipulation happens at the interface level, in real-time, tailored to your behavior. No settings menu protects against that.

What actually helps is developing a working model of how these systems operate. Recognizing scarcity cues as tested interventions rather than genuine information changes how you respond to them. Understanding that default settings represent the company’s preferred outcome, not a neutral recommendation, prompts you to actively choose rather than passively accept. Knowing that the “personalized” experience you’re seeing has been optimized to produce a specific behavior from you makes that behavior harder to elicit.

Tech companies will continue running these experiments because they work. The only meaningful counterweight is users who understand what’s being done to them. That’s a modest form of resistance, but it’s a real one.