When Facebook ran a study in 2014 manipulating the emotional content of nearly 700,000 users’ news feeds to see if it could influence their moods, the public reaction was outrage. Facebook’s defense was essentially procedural: users had agreed to a data use policy that permitted research. What got less attention was the admission buried inside that defense. This kind of experimentation was not a one-time event. It was, and remains, standard practice.

A/B testing is sold to the public as a benign optimization tool. Companies test two versions of a button, pick the one that gets more clicks, and ship it. Clean, rational, user-serving. The reality is considerably more complex, and the gap between the marketing version and the operational reality tells you a lot about how the software industry actually thinks about its users.

What A/B Testing Actually Measures

Most A/B tests are not measuring whether you like something. They are measuring whether a change in your environment produces a change in your behavior that benefits the company running the test. These are not the same thing.

Netflix has been public about running hundreds of simultaneous experiments on its platform at any given time. Many of these are genuinely innocuous: testing whether a particular thumbnail increases episode starts, or whether a different loading animation reduces the perceived wait time. But the same infrastructure that optimizes thumbnail selection also optimizes how aggressively autoplay advances to the next episode, how long the countdown timer runs before skipping credits, and how the interface communicates (or obscures) how much time you have been watching. The metric being optimized is watch time, not satisfaction.

This distinction matters. Watch time and satisfaction correlate, but they are not the same variable. A platform optimizing purely for watch time will, over many iterations, converge on design patterns that keep you watching regardless of whether you feel good about it afterward. The experiments never ask that question.

Illustration of an invisible control layer sitting over a smartphone interface
The interface you see and the experiment running beneath it are two different things.

Every major platform’s terms of service contain language authorizing experimentation. Legally, this covers the practice. Ethically, it does not, and the distinction is worth taking seriously.

Informed consent in any meaningful sense requires that you understand what you are consenting to. A clause buried in a terms-of-service document that almost nobody reads, written to be intentionally vague about scope, does not constitute informed consent in the way that phrase is understood in medical research, academic research, or common usage. When IRBs (Institutional Review Boards) evaluate human subjects research, they require that participants understand the nature of the study, what data will be collected, and what the potential harms are. Software companies operate under no equivalent requirement.

The Facebook emotional contagion study was eventually published in the Proceedings of the National Academy of Sciences. The researchers had to add a post-hoc note acknowledging that the ethical review process was inadequate. Cornell University, whose researcher co-authored the paper, opened an inquiry. The outcome was mostly a shrug. Nothing changed structurally. The experiments continued.

What makes this particularly hard to address is the scale asymmetry. Any individual user can opt out of a product entirely. They cannot opt out of the experiments being run on them while they use it, because they cannot see the experiments. You have no way of knowing whether the version of an app you are using today is the control group or the treatment group, what hypothesis is being tested, or what happens to the data.

How Habit Formation Becomes a Design Specification

The more sophisticated the experimentation program, the less it is about individual features and the more it is about behavioral sequences over time. Companies with large enough user bases and long enough data histories are not just testing whether you click a button. They are testing whether a sequence of interventions over weeks or months changes your baseline behavior.

This is where the language of habit formation enters the picture, not as a side effect but as an explicit design goal. The literature on habit loops (cue, routine, reward) has been absorbed thoroughly into product design. Nir Eyal’s book “Hooked,” which laid out a framework for building habit-forming products, has been read widely enough in Silicon Valley that it functions almost as an industry manual. The framework is explicitly about engineering the psychological conditions under which people form habitual responses to software triggers.

A/B testing is the instrumentation layer on top of that theory. You hypothesize that a particular notification cadence will increase return visits. You run the experiment across a user segment. You measure the change in session frequency. You ship the winner. Repeat. Over enough iterations, you have systematically adjusted the behavior of millions of people toward patterns that serve your engagement metrics, without any individual user ever having agreed to participate in a habit-modification program.

The Information Asymmetry Is the Product

Companies publish almost nothing about what they test or what they find. Google, Meta, Amazon, and Netflix all have internal research organizations that produce enormous volumes of experimental results. Almost none of this is shared publicly. What gets published tends to be either flattering case studies or methodological papers that carefully omit the content of the experiments being described.

This information asymmetry is not incidental. Knowing that a platform has determined through experimentation that a particular notification pattern increases compulsive checking behavior, and then deploys that pattern to all users, would be difficult to defend publicly. Not publishing the results is a simpler solution. As noted in our piece on how tech companies bury features that don’t serve their interests, opacity is often a deliberate product decision, not a failure of communication.

The regulatory environment has not caught up. GDPR in Europe requires some constraints around personal data use in automated decision-making, but A/B testing that affects behavioral outcomes without processing data in legally defined ways sits in a gray zone. The FTC in the United States has broad authority over deceptive practices but has not moved systematically against behavioral experimentation.

What Would Honest Experimentation Look Like

The argument that A/B testing is inherently harmful does not hold up. Testing whether a larger font size improves readability for older users is not an ethical problem. The problem is the combination of scale, opacity, behavioral targeting, and misaligned incentives.

Honest experimentation would require publishing experiment registries before tests run, the way clinical trials are registered in advance. It would require clear disclosure when a user is in an active experiment. It would require that the metrics being optimized are disclosed and that satisfaction is measured alongside engagement. None of this is technically difficult. It would reduce the competitive advantage that comes from running experiments nobody knows about.

That is precisely why it will not happen voluntarily. The value of behavioral experimentation comes partly from its invisibility. Users who know they are in an experiment behave differently (the Hawthorne effect is real and well-documented). Users who can see what is being optimized can make more informed decisions about whether to use a product. Both of these outcomes are bad for engagement metrics.

The companies running these experiments are not cartoonishly evil. Most of the people running individual tests believe they are making products better. But the aggregate effect of thousands of experiments, all optimizing for engagement metrics that proxy imperfectly for user welfare, is a software environment that has been systematically tuned to capture your attention and reshape your habits in ways you never agreed to. That should bother you, even if it is technically legal.