BlockPilot introduces instance-adaptive policy learning to optimize diffusion-based speculative decoding. While current methods rely on fixed inference block sizes and uniform strategies, BlockPilot aims to improve upon these limitations to enhance the efficiency of generating multiple tokens per forward pass. This approach builds on block-level diffusion to maintain lossless acceleration during LLM inference.
Read original
huggingface/daily-papers