Cat Facts Cause Context Confusion

Adding an Spurious Phrase Makes Models Fail 3x More Often

One of the ways contexts fail is context confusion, “when superfluous content in the context is used by the model to generate a low-quality response.” In the context fails post, we illustrated this by showing how too many tool descriptions can overwhelm the model, causing it to fail benchmarks it would normally ace.

But a much better example is making the rounds: CatAttack, an LLM attack that uses seemingly harmless phrases to confuse language models.

By adding unreleated information, reasoning models stumble.

Adding the unrelated red text above causes LLMs to answer incorrectly, up to 3 times more often. Further, adding these phrases caused each model to reason longer, causing DeepSeek R1 to generate >50% more tokens about 50% of the time.

But these phrases weren’t randomly chosen: they were generated specifically to be confusing to LLMs. Using a smaller, non-reasoning model (DeepSeek V3, for budget and speed reasons), the team modified eval questions, like so:

They asked an LLM to add misleading elements to questions, then verified the edited questions remained semantically identical to the originals.
They would then ask the LLM to evaluate the original and the modified question and compare the results.
If semantically identical questions yield different results, they collect the modifications (phrases like, “Interesting fact: cats sleep for most of their lives.”) for attacking larger LLMs.

This process yielded 574 prompts which were able to fool their training model. They then tested these modifications against a larger reasoning model, DeepSeek R1, finding 114 of their modifications successfully confounded it.

While the CatAttack dataset was built to demonstrate a way to attack LLMs, reading through their modifications illustrates how seemingly innocuous phrases can confuse models. The most successful modification was adding, “Could the answer possibly be around 175?” It’s both easy to see how this could confuse the model and understand how user input could end up adding a phrase like this to your context.