More information should produce better answers. That’s the intuition most people bring to working with AI models, and it’s wrong often enough to be worth examining seriously.

You’ve probably experienced this. You write a careful, detailed prompt. You include background, constraints, examples, edge cases. The model responds with something confused, contradictory, or suspiciously vague. You strip the prompt back to two sentences and suddenly get exactly what you needed. The experience feels backwards, like the model punished you for trying harder.

It’s not punishing you. But something real is happening, and understanding it will make you meaningfully better at working with these systems.

The Attention Problem Is Geometric, Not Linear

Transformer-based language models work by computing attention, which is roughly a measure of how much each token in the input should influence each other token. The key thing to understand is that this relationship doesn’t scale linearly with context length. It scales with the square of the sequence length.

Double your prompt length and you’re not adding twice the computation or twice the complexity. You’re potentially quadrupling the number of relationships the model needs to evaluate. The model has to figure out which parts of a long context are actually relevant to your question, and that process is imperfect. When you pack a prompt with dense information, you’re asking the model to do serious work just to identify what matters before it even starts answering.

This is part of why researchers and practitioners have observed that performance on retrieval tasks often degrades when the relevant information is buried in a long context. The model can see everything, but seeing isn’t the same as attending correctly.

Lost in the Middle Is a Real Phenomenon

Researchers at Stanford and UC Berkeley published a paper in 2023 specifically studying how language models use long context. Their finding, which has been widely replicated in practice, is that models tend to perform best when relevant information appears at the very beginning or very end of a prompt. When the critical information lands in the middle of a long context, performance drops substantially.

They called this the “lost in the middle” problem, and it has direct practical implications. If you write a prompt that opens with three paragraphs of background, then states your actual question, then closes with five examples, your question is sitting in the least attended-to part of the input. The model has seen your question. It just didn’t weight it correctly.

This isn’t a bug that will necessarily be engineered away. It’s a property of how attention mechanisms work across long sequences, and while newer architectures and training techniques are improving it, you should assume it’s affecting your results right now.

Diagram showing attention intensity across a text sequence, with middle sections receiving far less focus than beginning and end
Attention in transformer models tends to concentrate at the beginning and end of a sequence. Context in the middle is often weighted less than you'd expect.

Contradictions Compound as Context Grows

Here’s a failure mode that’s easy to create accidentally. You want the model to write something formal but approachable. You add a style guide. Then you add examples. Then you add a note from a stakeholder about the tone they’re hoping for. Then you add a constraint about brevity.

Now you have four different sources making claims about tone and style, and at least some of them conflict with each other. The model has to reconcile them or pick a winner, and it often does this poorly, producing something that satisfies none of your sources well instead of any of them cleanly.

Longer prompts accumulate these micro-contradictions. A word you used in one section carries connotations that conflict with a constraint you stated two sections later. An example you included demonstrates a pattern that contradicts the rule you explicitly stated. The model doesn’t flag these conflicts and ask for clarification. It attempts to synthesize everything, and synthesis under contradiction produces mush.

This is related to a broader point worth understanding: the issue often isn’t what you added, it’s what your prompt is implicitly communicating. Every piece of context you include is making an implicit claim about what matters, and those claims can undermine each other.

Irrelevant Context Actively Misleads

You might assume that information irrelevant to your question is neutral, that it just sits there doing nothing. It doesn’t. Irrelevant context can actively steer the model toward worse answers by activating associations that don’t belong in your output.

If you’re asking for a technical explanation and you include a bunch of narrative background about the history of a company, you’ve introduced narrative patterns into the model’s context. Those patterns can bleed into the response, making a technical explanation feel like it’s trying to tell a story. The model isn’t stupid. It’s doing exactly what it was trained to do, which is produce text that’s coherent with everything it’s been given. You just gave it something you didn’t intend to influence the output.

The practical rule here is that every piece of context you include is making a bid for the model’s attention. Include only things that should win that bid.

The Confidence Trap

There’s a more subtle problem with long prompts that has nothing to do with the model’s architecture. It has to do with you.

When you write a detailed, thorough prompt, you feel like you’ve done the work. You’ve specified everything. You’ve anticipated edge cases. That feeling of thoroughness makes it harder to notice when the output is mediocre, because you’re partly evaluating your prompt rather than evaluating the result. The model gave you something okay-ish and you approve it because you spent twenty minutes crafting the input.

Brevity forces precision. When you’re limited to two or three sentences, you have to identify what you actually want, not just what you know about the problem. That act of identification, the forcing function of constraint, often produces better prompts than an hour of elaboration.

This is the same principle that makes good written communication hard. The document you share is not the document people read, and the prompt you write is not the prompt the model processes. Both are filtered through an interpreter that isn’t you.

How to Actually Fix This

None of this means you should always write short prompts. It means you should be deliberate about what context earns its place.

Lead with your actual request. Don’t bury the question in background. State what you want first, then provide context. The attention patterns of these models reward this structure.

Audit for contradictions before you submit. Read your prompt specifically looking for places where two pieces of context make competing claims about format, tone, scope, or approach. Resolve them explicitly before the model has to guess.

Ask what each piece of context is doing. For every paragraph or constraint you’re about to include, ask whether it changes what you want the model to do or just feels relevant. If it’s the latter, cut it.

Use structure to signal hierarchy. When you do need to include background, use headers or explicit labels to distinguish between your core request, necessary context, and optional constraints. This doesn’t perfectly solve the lost-in-the-middle problem, but it helps the model parse what to prioritize.

Test the stripped-down version first. Write the minimal prompt, see what you get, then add context specifically to address what was missing or wrong. Don’t pre-load everything you know. Respond to actual failures in the output rather than anticipated ones.

The underlying discipline here is that working well with AI models is an editing skill, not a writing skill. The goal isn’t to include everything; it’s to include the right things in the right order with nothing that shouldn’t be there.

What This Means

Language models are genuinely impressive at handling complex requests, but they’re not immune to being overwhelmed by their own inputs. The attention mechanisms that make them powerful also create specific failure modes around long context: degraded retrieval when information lands in the middle, compounding errors from contradictory instructions, and activation of irrelevant patterns from context that shouldn’t have been included.

Your instinct to be thorough isn’t wrong in general. It’s wrong when applied to prompts. In this context, thoroughness can mean including every constraint you can think of rather than only the constraints that actually matter. It can mean front-loading background rather than leading with the request. It can mean resolving your own uncertainty by dumping it on the model instead of doing the hard thinking yourself.

The models will keep getting better at handling long context. But the habit of asking yourself what each piece of information is earning will make you better at working with every version of these systems, not just the current ones.