Ax Bo Jiang, Sian Jin 3/31/2026

KVSculpt: KV Cache Compression as Distillation

KVSculpt: KV cache compression for long-context LLM inference treating compression as knowledge distillation, orthogonal to quantization and low-rank methods.