Recurrent Neural Networks vs Transformers: ---------------------------------------------- - Core Mechanism: Recurrent neural networks process the inputs sequentially, dealing with one word or subword at a time. In each time step, they update internal hidden state based on the piece of input that they have just processed. Transformers, on the other hand, consume the input sequence as a whole rather than word by word and process it in parallel. While doing that, they do not rely on recurrence or sequentially updated hidden state, instead leverage attention mechanism. - Parallelization and Training: The recurrent nature of RNNs makes them computationally expensive: Each step in the sequence require processing the information from previous steps. This makes the steps are connected to each other; each one needs the result of previous one. That is why, its training cannot be parallelizable, which becomes more critical at longer sequences. In transformer layers, the computations for each position are independent from other ones, which facilliates simultaneous computions for all positions. That is why, it is parallelizable. - Long-Range Dependencies: Recurrent neural networks struggle with long sequences. Information flow from early steps has to pass through many intermediate steps, potentially getting diluted or lost as a vanishing gradient problem. The path length for information flow grows linearly. In contrast, transformers are quite successful at capturing long-range dependencies. Self-attention mechanism allows any two tokens in the sequence to interact with directly, so maximum path length for information flow is constant as O(1). - Sequential Order: The order of tokens is inherently preserved in RNNs, since its architecture scheme is organized in that way. Each time step relies on the output of previous one. Nevertheless, transformers do not have a built-in sense of order, so positional encoding is specifically added to input embeddings to provide information about the position of each token.
Want to write longer posts on Bluesky?
Create your own extended posts and share them seamlessly on Bluesky.
Create Your PostThis is a free tool. If you find it useful, please consider a donation to keep it alive! 💙
You can find the coffee icon in the bottom right corner.