Transformer MHA KV-Cache技术小结(MHA,GQA,MQA,MLA)
2.1 Transformer & MHA
参考:https://blog.csdn.net/xiangxueerfei/article/details/144560852
2.2 KV Cache & MQA &GQA
参考:https://zhuanlan.zhihu.com/p/25547444712
2.3 MLA
Attention 进阶史(MHA 、MQA、 GQA、 MLA)