Comparative Analysis of Quantization Accuracy: Gemma vs. Qwen Models

A community-driven empirical evaluation exploring the impact of various quantization levels on the accuracy of Gemma and Qwen models, specifically focusing on arithmetic precision and factual recall.

Overview of the Benchmark Methodology

Evaluating the performance of quantized Large Language Models (LLMs) often relies on Kullback–Leibler Divergence (KLD) metrics. However, KLD numbers can be difficult to interpret for practical deployment and do not easily allow for cross-model comparisons—such as comparing a 9B parameter model at 4-bit quantization (Q4) against a 4B parameter model at 8-bit quantization (Q8).

To address this gap, a series of contrived tests were conducted to measure actual output accuracy across different quantization schemes for the Gemma and Qwen families.

Test Suite Details

Test 1: Arithmetic Precision

The first benchmark focused on the models' ability to handle large-scale integer addition. The test consisted of 1,000 questions designed to evaluate numerical stability and precision under quantization. To ensure clean data collection, strict prompting was used to constrain the output to a single numerical value without commas or underscores.

Sample Prompt: "Print only one number as the answer to the following question. Print nothing else, please. Do not use commas or underscores. It is very important. 998604052310776342 + 249349834805792420 = ?"

Test 2: Factual Recall (Presidents)

The second benchmark evaluated the models' knowledge retrieval capabilities through a set of 46 questions regarding presidents, testing how quantization affects the retention of specific factual data.

Limitations of the Analysis

Note: The provided source material is an excerpt and does not include the final result sets or the specific performance percentages for each quantization level. Consequently, the comparative conclusions between the Gemma and Qwen architectures cannot be fully detailed in this report.

Original Source

LLM Quantization Gemma Qwen Model Evaluation LocalLLaMA

Techyon

Some contrived tests comparing the accuracy of different Gemma and Qwen quantizations

Comparative Analysis of Quantization Accuracy: Gemma vs. Qwen Models

Overview of the Benchmark Methodology

Test Suite Details

Test 1: Arithmetic Precision

Test 2: Factual Recall (Presidents)

Limitations of the Analysis

Some contrived tests comparing the accuracy of different Gemma and Qwen quantizations

Comparative Analysis of Quantization Accuracy: Gemma vs. Qwen Models

Overview of the Benchmark Methodology

Test Suite Details

Test 1: Arithmetic Precision

Test 2: Factual Recall (Presidents)

Limitations of the Analysis

Related Articles

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B)

Beyond the Prompt: How I Turned Claude Code Into a Full-Stack Engineering Partner

oceanbase /seekdb

NVIDIA /TensorRT

Claude Fable is relentlessly proactive