On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

A new research study investigates the tension between a Large Language Model's (LLM) internalized prior knowledge and explicit user instructions, specifically examining how "decision stickiness" and prior familiarity influence performance in zero-shot annotation and LLM-as-a-judge workflows.

The Conflict Between Priors and Instructions

As Large Language Models are increasingly deployed for zero-shot annotation and as evaluators (LLM-as-a-judge), their reliability becomes a critical concern. The research conducted by Etienne Casanova, Rafal Kocielnik, and R. Michael Alvarez explores the fundamental interaction between a model's internalized priors—the knowledge and biases acquired during pre-training—and the specific instructions provided during inference.

Key Dimensions of Model Adaptability

The study analyzes the limitations of LLM adaptability through three primary technical dimensions:

1. Familiarity with Data and Task Definitions

The researchers investigate how the model's prior exposure to specific datasets or task definitions impacts its performance. The goal is to determine if familiarity acts as a catalyst for accuracy or if it introduces biases that override the specific constraints defined in the prompt.

2. Decision Stickiness and Error Correction

A core focus of the research is "decision stickiness"—the extent to which an LLM persists in an incorrect zero-shot prediction even when provided with additional corrective information within the prompt. This suggests a potential rigidity in model reasoning where internalized priors may outweigh explicit evidence provided in the context window.

3. Susceptibility to Misaligned Tasks

The study examines the model's vulnerability when faced with tasks that are misaligned with its internal priors, analyzing how often the model defaults to its pre-trained expectations rather than adhering to the user's unique task requirements.

Implications for AI Evaluation

Understanding these limits is essential for developers implementing LLM-based labeling pipelines. If model-internalized priors dominate over prompt-based instructions, the reliability of zero-shot annotation is compromised, potentially leading to systematic biases in dataset creation and model evaluation.

Note: The provided source material is a summary of the research; detailed experimental results and specific model benchmarks are not included in the raw text.

Original Source

LLM Zero-Shot Learning Model Priors LLM-as-a-Judge Model Adaptability

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

The Conflict Between Priors and Instructions

Key Dimensions of Model Adaptability

1. Familiarity with Data and Task Definitions

2. Decision Stickiness and Error Correction

3. Susceptibility to Misaligned Tasks

Implications for AI Evaluation

Related Articles

Claude Opus 4.8 vs Claude Fable 5 — Anthropic’s Biggest AI Shift Yet

Natfii /UnrealClaude

Made a macOS app that creates highly personal macOS apps. Works with models as small as Gemma 4 E2B

Did Anthropic ask for this?

Voice-to-voice chatbot update