Recurrent Neural Networks (RNNs): 4 Ideas That Explain How They Actually Work
When you go through the concepts of basic neural networks, you will start to understand their limitations and gradually move on to more advanced concepts like RNNs. This is a continuation of Neural Networks Starter Kit: 9 Fundamental Building Blocks for Developers, so take a read to get a better grasp of the basics.
Lets start with our first idea, why do we need RNNs?
Why we need RNNs?
The Problem with Traditional Neural Networks
To understand the problem with traditional neural networks, we can start with an example.
Consider predicting stock prices.
Stock prices change over time. For example, a stock might go up for four consecutive days and then suddenly drop. If we want to predict what happens next, we cannot rely on just one data point. We need to look at what happened before.
This creates a challenge for traditional neural networks.
- They expect fixed-size input
- They do not naturally handle time-based or sequential data
For instance:
- If a company has been trading for 10 days, we might use the previous 9 days to predict the next price
- If another company has only 5 days of history, we only have 5 previous values
So the number of inputs is not constant.
So when we boil this down,
Traditional neural networks do not “remember” previous inputs. They treat each input independently.
The sequential nature of real world problems
Stock prices are just one example. Many real-world problems involve sequences.
- Prices change over time
- Patterns depend on previous values, not just the current one
If we look closely:
- A company with a long history gives us more past data
- A newer company gives us less
Yet we still want to make predictions in both cases.
So we need a model that can handle variable-length sequences and still make meaningful predictions.
How RNNs help with such problems
An RNN is a type of neural network designed specifically for sequential data.
It still has the same basic components:
- weights
- biases
- layers
- activation functions
But there is one important addition: Feedback loops
How Feedback loop changes things up
The feedback loop changes how the network behaves.
- It does not just take the current input
- It also uses information from previous steps
This creates a form of memory.
You can think of it like this:
- The network remembers what it saw before and uses it to understand the current input.
- Even though it may look like the network is handling just one input at a time, it is actually going through the sequence step by step, carrying past information along with it.
- Because of this, RNNs are able to understand patterns that depend on time or order.
Now that we understand why RNNs are needed, let’s look at how data flows through an RNN over time.
How RNNs Process Sequential Data
Now that we know why RNNs are needed, let’s look at how they actually process data.
We will continue with the stock price example, but simplify it.
Assume the model tries to learn patterns like:
- If yesterday’s and today’s stock prices are low, then tomorrow’s price should also be low.
- If yesterday’s price was low and today’s price is medium, then tomorrow’s price should be even higher.
- If price decreases from high to medium, then tomorrow’s price will be even lower.
- Lastly, if the price stays high for two days in a row, then the price will be high tomorrow.
These are not fixed rules. They are just patterns the model tries to learn from data.
To make things easier to work with, we scale the values:
- Low = 0
- Medium = 0.5
- High = 1
- First, we input yesterday’s price
- Then, we input today’s price
After each step, the model updates its internal state.
This step-by-step processing is what allows the network to handle sequences.
What Happens at Each Step
At each time step, the input goes through the usual neural network operations:
- weights
- bias
- activation function (such as ReLU)
This produces an output. Let’s call the output after processing yesterday’s value y₁.
But here is the important part: This output is not discarded
The Feedback Loop
The output from the previous step is fed back into the network.
So when we process today’s value:
- the network uses today’s input
- and combines it with y₁, the output from yesterday
Conceptually, you can think of it like this:
Current Output = f(current input + previous output)
In simple terms:
- yesterday’s information acts as memory
- today’s input provides new information
- both are combined to produce the next state
This is how an RNN “remembers” what it has seen before
- The prediction for tomorrow depends on:
- yesterday
- today
The network builds context over time, instead of treating each input independently.
At each step, the network produces an output, Depending on the problem:
- we might use the output from every step
- or we might use only the final output
In this example, we are interested in predicting tomorrow’s price, so we focus on the final output after processing the sequence.
Even though this step-by-step process makes sense, the feedback loop can be hard to visualize.
To understand it more clearly, we can represent the same process differently.
Instead of a loop, we can “unroll” the network across time and look at each step explicitly.
Unrolling an RNN (Understanding It Across Time)
Why Unrolling is Needed
RNNs are usually represented with a loop, where the output feeds back into the network. While this is accurate, it makes it hard to understand what is happening at each step.
To make things clearer, we “unroll” the network.
You can think of it as turning a loop into a sequence of steps
What Unrolling Means
When we unroll an RNN:
- Each time step is shown explicitly
- It looks like a chain of repeated structures
For example, with two time steps:
- Step 1 processes yesterday’s input
- Step 2 processes today’s input
Each step:
- takes the current input
- takes the previous state
- produces an output
Instead of a loop, it now looks like a sequence
Step-by-Step Flow
Let’s walk through a simple example with two days of data.
Step 1 (Yesterday)
- Input: yesterday’s price
- Output: a hidden state representing memory
Step 2 (Today)
- Input: today’s price
- Also uses: previous hidden state
- Produces: updated state
Final Output
- This final state is used to predict tomorrow’s price
- Each step builds on the previous one
Extending to More Time Steps
This idea extends naturally.
- 2 days → 2 steps
- 3 days → 3 steps
- n days → n steps
No matter how long the sequence is, we just keep extending the chain.
This is how RNNs handle variable-length input
The concept of shared Weights
Even though the network looks like multiple steps, all of them use the same weights and biases.
- There are not different parameters for each step
- The same transformation is applied repeatedly
This is important because:
- the model size stays constant
- it learns a general pattern over time
- it does not depend on position in the sequence
But as sequences become longer, training becomes difficult, and the network struggles to retain information from earlier steps, so this becomes a limitation of RNN.
Limitations of RNNs (Vanishing & Exploding Gradients)
The Core Problem: Training Gets Hard Over Time
So far, RNNs seem quite powerful.
But there is a practical issue that shows up when we try to train them.
As we unroll an RNN over time:
- 2 steps → easy
- 10 steps → harder
- 50+ steps → very hard
Because training involves backpropagation through time.
Think of backpropagation like sending feedback backward in a network so it can learn what it did wrong.
In a normal neural network, this feedback only travels through a few layers. But in an RNN, the same network is repeated across many time steps, so the feedback has to pass through each step one by one.
Why These issues Happen
At each time step:
- the same weights are reused
- the same transformation is applied repeatedly
During backpropagation:
- gradients are passed through each step
- and get multiplied again and again
This repeated multiplication is the root cause.
A simple way to think about it is that small effects compound over time, either shrinking or growing rapidly.
Exploding Gradients (When Things Blow Up)
Exploding gradients occur when the weights involved are greater than 1.
In that case, gradients grow exponentially as they pass through time.
If:
- w = 2
- t = 50
gradient ≈ 2⁵⁰, which is an extremely large number
What happens then:
- updates to weights become very large
- training becomes unstable
Vanishing Gradients (When Learning Dies)
To avoid exploding gradients, we might reduce the weight values.
But this introduces another problem.
If weights are less than 1:
- gradients shrink exponentially
For example:
- w = 0.5
- t = 50
gradient ≈ 0.5⁵⁰, which is very close to zero
What happens then:
- gradients become too small to make meaningful updates
- earlier time steps stop contributing to learning
The model effectively “forgets” what happened in the past.
How this affects real-world tasks
- In stock prediction:
- older data might still matter
- In language:
- words earlier in a sentence can influence meaning later
But RNNs struggle with:
- words earlier in a sentence can influence meaning later
- long sequences
- long-term dependencies
There is no stable way to retain important information over long sequences.
A simple way to think about RNNs: They can remember, but they cannot manage what they remember.
There is no mechanism to:
- prioritize important information
- discard irrelevant details
- maintain stable memory over time
As sequences get longer, this limitation becomes more noticeable.
So these limitations led to the development of more advanced architectures like LSTMs and GRUs.