Knowledge Distillation of Black-Box Large Language Models (2024)
This 2024 study investigates knowledge distillation techniques for extracting capabilities from black‑box large language models (LLMs), addressing challenges of model opacity and computational efficiency.
Introduction
Large language models have become foundational in natural language processing, yet their size and proprietary nature limit accessibility. Knowledge distillation offers a pathway to compress and replicate the behavior of such black‑box models into smaller, trainable architectures.
Methodology
The authors propose a two‑stage distillation pipeline: (1) generation of soft targets via temperature‑scaled logits from the black‑box LLM, and (2) training of a student model using a combination of distillation loss and task‑specific supervision. The approach is evaluated on benchmark datasets spanning text generation, summarization, and question answering.
Results
Empirical results demonstrate that the distilled student models achieve performance within 2–4 % of the original black‑box counterparts while reducing parameter count by an order of magnitude. Ablation studies reveal that temperature annealing and curriculum‑based data selection significantly impact transferability.