Efficient Memory Management for Large Language Model with PagedAttention


This paper porposed PagedAttention Algorithm, inspired by paging technique in OS.It can improve 2~4x memory throughput.


This paper porposed PagedAttention Algorithm, inspired by paging technique in OS.It can improve 2~4x memory throughput.