The author shows that speculative decoding can speed up a 14B local LLM by up to three