THE TRANSFORMER SERIES

Transformer’s Encoder-Decoder

Understanding The Model Architecture

Naoki
11 min readDec 12, 2021

--

In 2017, Vaswani et al. published a paper titled “Attention Is All You Need” for the NeurIPS conference. They introduced the original transformer architecture for machine translation, performing better and faster than RNN encoder-decoder models, which were mainstream.

The transformer architecture is the basis for recent well-known models like BERT and GPT-3. Researchers have already applied the transformer architecture in computer vision and reinforcement learning. So, understanding the transformer architecture is crucial if you want to know where machine learning is making headway.

However, the transformer architecture may look complicated to those without much background.

Figure 1 of the paper

The paper’s author says the architecture is simple because it has no recurrence and convolutions. In other words, it uses other common concepts like an encoder-decoder architecture, word embeddings, attention mechanisms, softmax, and so on without the complication introduced by recurrent neural networks or convolutional neural networks.

--

--