Speculative Decoding

KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem

A self-speculative decoding method with training-free, knapsack-based adaptive layer selection that accounts for context-dependent attention overhead