On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance
A new research study investigates the tension between a Large Language Model's (LLM) internalized prior knowledge and explicit user instructions, specifically examining how "decision stickiness" and prior familiarity influence performance in zero-shot annotation and LLM-as-a-judge workflows.
The Conflict Between Priors and Instructions
As Large Language Models are increasingly deployed for zero-shot annotation and as evaluators (LLM-as-a-judge), their reliability becomes a critical concern. The research conducted by Etienne Casanova, Rafal Kocielnik, and R. Michael Alvarez explores the fundamental interaction between a model's internalized priors—the knowledge and biases acquired during pre-training—and the specific instructions provided during inference.
Key Dimensions of Model Adaptability
The study analyzes the limitations of LLM adaptability through three primary technical dimensions:
1. Familiarity with Data and Task Definitions
The researchers investigate how the model's prior exposure to specific datasets or task definitions impacts its performance. The goal is to determine if familiarity acts as a catalyst for accuracy or if it introduces biases that override the specific constraints defined in the prompt.
2. Decision Stickiness and Error Correction
A core focus of the research is "decision stickiness"—the extent to which an LLM persists in an incorrect zero-shot prediction even when provided with additional corrective information within the prompt. This suggests a potential rigidity in model reasoning where internalized priors may outweigh explicit evidence provided in the context window.
3. Susceptibility to Misaligned Tasks
The study examines the model's vulnerability when faced with tasks that are misaligned with its internal priors, analyzing how often the model defaults to its pre-trained expectations rather than adhering to the user's unique task requirements.
Implications for AI Evaluation
Understanding these limits is essential for developers implementing LLM-based labeling pipelines. If model-internalized priors dominate over prompt-based instructions, the reliability of zero-shot annotation is compromised, potentially leading to systematic biases in dataset creation and model evaluation.
Note: The provided source material is a summary of the research; detailed experimental results and specific model benchmarks are not included in the raw text.
Original Source