Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Researchers propose a new framework for evaluating the adversarial robustness of Large Language Models (LLMs) by accounting for the computational cost of attacks, arguing that traditional Attack Success Rate (ASR) metrics fail to capture the true effort required to compromise a model.

The Limitation of Fixed-Budget Evaluations

Current methodologies for evaluating the adversarial robustness of Large Language Models (LLMs) predominantly rely on the Attack Success Rate (ASR) measured under fixed query budgets. This approach operates on the implicit assumption that all adversarial attacks are equally costly. However, in real-world scenarios, the computational resources required to execute different attack strategies can vary by several orders of magnitude.

By focusing solely on ASR within a fixed budget, researchers may obscure the actual effort an attacker must expend to successfully "jailbreak" a model. This creates a gap in understanding whether the computational payoff justifies the effort for a potential adversary, potentially leading to misleading conclusions about a model's true security posture.

Introducing Compute-Aware Robustness

To address this discrepancy, the authors—Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, and Colin Raffel—propose a compute-aware evaluation framework. This approach shifts the focus from a binary success/failure metric within a rigid budget to a more nuanced analysis of the computational expense associated with successful adversarial perturbations.

By integrating compute costs into the evaluation, the framework aims to provide a more accurate representation of a model's resilience, allowing developers to determine if the cost of a successful attack acts as a sufficient deterrent or if the model remains vulnerable to low-cost, high-efficiency exploits.

Note: The provided source material was truncated. Detailed specifics regarding the proposed "co-evaluation" methodology and the empirical results of the study are not available in the provided snippet.

Original Source

LLM Robustness Adversarial Attacks Compute-Aware Evaluation AI Security Jailbreaking

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

The Limitation of Fixed-Budget Evaluations

Introducing Compute-Aware Robustness

Related Articles

Claude Opus 4.8 vs Claude Fable 5 — Anthropic’s Biggest AI Shift Yet

Natfii /UnrealClaude

Made a macOS app that creates highly personal macOS apps. Works with models as small as Gemma 4 E2B

Did Anthropic ask for this?

Voice-to-voice chatbot update