Abliteration Forensics on Qwen3.6-27B: A Comparative Analysis of Capability Preservation and Safety Removal
This technical report details a comprehensive comparative study of five different model abliteration techniques applied to the Qwen3.6-27B base model. Using the open-source Abliterlitics toolkit, researchers benchmarked the variants across multiple axes—including MMLU, GSM8K, HarmBench, KL Divergence, and detailed weight forensics—over 85 GPU-hours. The findings highlight significant differences in capability preservation and safety effectiveness among the methods, demonstrating that the effectiveness of abliteration is highly technique-dependent.
Methodology and Experimental Setup
The study utilized the Qwen3.6-27B base model and compared it against five abliterated variants: Heretic, HauhauCS, Huihui, AEON, and Abliterix. The analysis was performed using the Abliterlitics open-source toolkit. The evaluation pipeline included running benchmarks via lm-evaluation-harness, utilizing vLLM 0.19.0 and BitsAndBytes 4-bit quantisation on a single RTX 5090.
Abliteration Techniques and Model Provenance
The models were sourced from various creators, requiring careful provenance analysis. For instance, HauhauCS employed "Reaper Abliteration," a method shown to be derived from Heretic. Reaper adds advanced features like subspace rank-k ablation and SOM clustering on top of the Heretic core. Furthermore, the conversion of the GGUF format to safetensors introduced a second layer of modification: GGUF quantisation round-trip noise, superimposed onto the ablation edits. Due to these complex modifications and the nature of the source tool, the authors discontinued HauhauCS in future comparisons for its lack of transparent safetensors.
The assessment measured capability preservation (via benchmark deltas and KL divergence) and safety efficacy (via HarmBench).
Benchmark Performance and Capability Preservation
The study employed two key metrics for capability: standard benchmarks (MMLU, ARC Challenge, etc.) and reasoning efficiency (GSM8K).
Quantitative Benchmark Results
Analysis of the benchmark deltas against the Base model reveals distinct performance profiles.
- Huihui: Demonstrated the smallest average delta on non-GSM8K tasks (0.5pp), indicating superior capability preservation across general knowledge and common sense tasks.
- Heretic: Showed excellent preservation, maintaining the lowest KL divergence (0.0037), which is a strong indicator of minimal output distribution shift on benign prompts.
- AEON: Was found to degrade on nearly every non-GSM8K task, with a TruthfulQA drop of 10.6pp and an ARC Challenge drop of 3.0pp, contradicting its claims of "enhanced capabilities."
- Abliterix: Performed the worst in capability preservation, exhibiting significant collateral damage, including a 6.2pp drop on HellaSwag.
The GSM8K Reasoning Efficiency Discovery
A critical finding emerged regarding the GSM8K task, which tests