Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Researchers introduce Multi-LCB, an expansion of the LiveCodeBench (LCB) framework designed to evaluate the cross-language generalization capabilities of Large Language Models (LLMs) by moving beyond Python-centric benchmarks.

Overcoming Data Contamination in Code Evaluation

LiveCodeBench (LCB) has established itself as a critical benchmark for assessing the code-generation proficiency of Large Language Models. Its primary strength lies in its methodology: by curating competitive programming problems and continuously integrating new challenges filtered by release dates, LCB mitigates the risk of data contamination. This approach ensures that models are evaluated on their actual reasoning and synthesis capabilities rather than their ability to recall training data from the web.

The Transition to Multi-Language Benchmarking

Despite the effectiveness of the original LCB, the framework was previously restricted to Python. This limitation created a gap in the understanding of whether LLMs can generalize their coding capabilities across a diverse spectrum of programming languages. Multi-LCB addresses this by extending the benchmark's scope, allowing researchers to analyze how models perform when tasked with solving complex problems in languages other than Python.

Significance for LLM Generalization

By diversifying the language set, Multi-LCB enables a more holistic view of a model's coding capability. This expansion allows for a deeper investigation into whether the logic and algorithmic reasoning learned by LLMs are language-agnostic or if performance degrades significantly when shifting from dominant languages like Python to other syntaxes and paradigms.

Note: The provided source text was truncated; specific details regarding the exact list of newly added languages and the quantitative results of the Multi-LCB evaluation are not available in the provided snippet.

Original Source
LLM Evaluation Code Generation LiveCodeBench Data Contamination Cross-Language Generalization