1706.03762v7_analysis
Structural Analysis on “Attention Is All You Need”
Author: Wei Li & Gemini
Problem Space Explanation
The baseline paper [1] addresses the limitations of existing sequence transduction models, primarily those based on recurrent neural networks (RNNs) [14, 13] and convolutional neural networks (CNNs) [10]. These models, while achieving state-of-the-art results in tasks like machine translation [36, 6, 25, 39], suffer from several key drawbacks.
Problem 1: Sequential Computation and Lack of Parallelization: RNNs process sequences sequentially, computing hidden states <