
How Good Is PagedAttention at Memory Sharing?
Table of Links Abstract and 1 Introduction 2 Background and 2.1 Transformer-Based Large Language Models 2.2 LLM Service & Autoregressive Generation 2.3 Batching Techniques for LLMs 3 Memory Challenges in LLM Serving 3.1 Memory Management in Existing Systems 4 Method and 4.1 PagedAttention 4.2 KV Cache Manager 4.3 Decoding with PagedAttention and vLLM 4.4 Application…