Attention is All you Need阅读笔记
1. Transformer结构
2. self attention 和 multi head attention
3. feed forward network
4. positional encoding 和 word embedding
5. BN & LN
6. ResNet
7. Subword tokenization
8. QKV
https://zhuanlan.zhihu.com/p/716632509