Recurrent Neural Networks (RNNs): 4 Ideas That Explain How They Actually Work

When you go through the concepts of basic neural networks, you will start to understand their limitations and gradually move on to more advanced concepts like RNNs. This is a continuation of Neural Networks Starter Kit: 9 Fundamental Building Blocks for Developers, so take a read to get a better grasp of the basics.

Lets start with our first idea, why do we need RNNs?

Why we need RNNs?

The Problem with Traditional Neural Networks

To understand the problem with traditional neural networks, we can start with an example.

Consider predicting stock prices.

Stock prices change over time. For example, a stock might go up for four consecutive days and then suddenly drop. If we want to predict what happens next, we cannot rely on just one data point. We need to look at what happened before.

This creates a challenge for traditional neural networks.

They expect fixed-size input
They do not naturally handle time-based or sequential data

For instance:

If a company has been trading for 10 days, we might use the previous 9 days to predict the next price
If another company has only 5 days of history, we only have 5 previous values

So the number of inputs is not constant.

So when we boil this down,
Traditional neural networks do not “remember” previous inputs. They treat each input independently.

The sequential nature of real world problems

Stock prices are just one example. Many real-world problems involve sequences.

Prices change over time
Patterns depend on previous values, not just the current one

If we look closely:

A company with a long history gives us more past data
A newer company gives us less

Yet we still want to make predictions in both cases.

So we need a model that can handle variable-length sequences and still make meaningful predictions.

How RNNs help with such problems

An RNN is a type of neural network designed specifically for sequential data.

It still has the same basic components:

weights
biases
layers
activation functions

But there is one important addition: Feedback loops

How Feedback loop changes things up

The feedback loop changes how the network behaves.

It does not just take the current input
It also uses information from previous steps

This creates a form of memory.

You can think of it like this:

The network remembers what it saw before and uses it to understand the current input.
Even though it may look like the network is handling just one input at a time, it is actually going through the sequence step by step, carrying past information along with it.
Because of this, RNNs are able to understand patterns that depend on time or order.

Now that we understand why RNNs are needed, let’s look at how data flows through an RNN over time.

How RNNs Process Sequential Data

Now that we know why RNNs are needed, let’s look at how they actually process data.

We will continue with the stock price example, but simplify it.

Assume the model tries to learn patterns like:

If yesterday’s and today’s stock prices are low, then tomorrow’s price should also be low.
If yesterday’s price was low and today’s price is medium, then tomorrow’s price should be even higher.
If price decreases from high to medium, then tomorrow’s price will be even lower.
Lastly, if the price stays high for two days in a row, then the price will be high tomorrow.

These are not fixed rules. They are just patterns the model tries to learn from data.

To make things easier to work with, we scale the values:

Low = 0
Medium = 0.5
High = 1

First, we input yesterday’s price
Then, we input today’s price

After each step, the model updates its internal state.

This step-by-step processing is what allows the network to handle sequences.

What Happens at Each Step

At each time step, the input goes through the usual neural network operations:

weights
bias
activation function (such as ReLU)

This produces an output. Let’s call the output after processing yesterday’s value y₁.

But here is the important part: This output is not discarded

The Feedback Loop

The output from the previous step is fed back into the network.

So when we process today’s value:

the network uses today’s input
and combines it with y₁, the output from yesterday

Conceptually, you can think of it like this:

Current Output = f(current input + previous output)

In simple terms:

yesterday’s information acts as memory
today’s input provides new information
both are combined to produce the next state

This is how an RNN “remembers” what it has seen before

The prediction for tomorrow depends on:
- yesterday
- today

The network builds context over time, instead of treating each input independently.

At each step, the network produces an output, Depending on the problem:

we might use the output from every step
or we might use only the final output

In this example, we are interested in predicting tomorrow’s price, so we focus on the final output after processing the sequence.

Even though this step-by-step process makes sense, the feedback loop can be hard to visualize.

To understand it more clearly, we can represent the same process differently.

Instead of a loop, we can “unroll” the network across time and look at each step explicitly.

Unrolling an RNN (Understanding It Across Time)

Why Unrolling is Needed

RNNs are usually represented with a loop, where the output feeds back into the network. While this is accurate, it makes it hard to understand what is happening at each step.

To make things clearer, we “unroll” the network.
You can think of it as turning a loop into a sequence of steps

What Unrolling Means

When we unroll an RNN:

Each time step is shown explicitly
It looks like a chain of repeated structures

For example, with two time steps:

Step 1 processes yesterday’s input
Step 2 processes today’s input

Each step:

takes the current input
takes the previous state
produces an output

Instead of a loop, it now looks like a sequence

Step-by-Step Flow

Let’s walk through a simple example with two days of data.

Step 1 (Yesterday)

Input: yesterday’s price
Output: a hidden state representing memory

Step 2 (Today)

Input: today’s price
Also uses: previous hidden state
Produces: updated state

Final Output

This final state is used to predict tomorrow’s price
Each step builds on the previous one

Extending to More Time Steps

This idea extends naturally.

2 days → 2 steps
3 days → 3 steps
n days → n steps

No matter how long the sequence is, we just keep extending the chain.

This is how RNNs handle variable-length input

The concept of shared Weights

Even though the network looks like multiple steps, all of them use the same weights and biases.

There are not different parameters for each step
The same transformation is applied repeatedly

This is important because:

the model size stays constant
it learns a general pattern over time
it does not depend on position in the sequence

But as sequences become longer, training becomes difficult, and the network struggles to retain information from earlier steps, so this becomes a limitation of RNN.

Limitations of RNNs (Vanishing & Exploding Gradients)

The Core Problem: Training Gets Hard Over Time

So far, RNNs seem quite powerful.
But there is a practical issue that shows up when we try to train them.

As we unroll an RNN over time:

2 steps → easy
10 steps → harder
50+ steps → very hard

Because training involves backpropagation through time.

Think of backpropagation like sending feedback backward in a network so it can learn what it did wrong.

In a normal neural network, this feedback only travels through a few layers. But in an RNN, the same network is repeated across many time steps, so the feedback has to pass through each step one by one.

Why These issues Happen

At each time step:

the same weights are reused
the same transformation is applied repeatedly

During backpropagation:

gradients are passed through each step
and get multiplied again and again

This repeated multiplication is the root cause.

A simple way to think about it is that small effects compound over time, either shrinking or growing rapidly.

Exploding Gradients (When Things Blow Up)

Exploding gradients occur when the weights involved are greater than 1.

In that case, gradients grow exponentially as they pass through time.

If:

w = 2
t = 50

gradient ≈ 2⁵⁰, which is an extremely large number

What happens then:

updates to weights become very large
training becomes unstable

Vanishing Gradients (When Learning Dies)

To avoid exploding gradients, we might reduce the weight values.

But this introduces another problem.

If weights are less than 1:

gradients shrink exponentially

For example:

w = 0.5
t = 50

gradient ≈ 0.5⁵⁰, which is very close to zero

What happens then:

gradients become too small to make meaningful updates
earlier time steps stop contributing to learning

The model effectively “forgets” what happened in the past.

How this affects real-world tasks

In stock prediction:
- older data might still matter
In language:
- words earlier in a sentence can influence meaning later
  But RNNs struggle with:
long sequences
long-term dependencies

There is no stable way to retain important information over long sequences.

A simple way to think about RNNs: They can remember, but they cannot manage what they remember.

There is no mechanism to:

prioritize important information
discard irrelevant details
maintain stable memory over time

As sequences get longer, this limitation becomes more noticeable.

So these limitations led to the development of more advanced architectures like LSTMs and GRUs.