What Really Happens Before Your Query Runs in SQLite

What Really Happens Before Your Query Runs in SQLite

You think query execution is the main part? Nope. The real work happens before that.

In the previous post, you saw how SQLite executes queries as a bytecode program inside a virtual machine.
We walked through how SQL gets compiled, how registers hold values, how joins actually run, and why this architecture works so well.

But here’s the thing.

By the time that bytecode starts executing, most of the important decisions have already been made.

The VM is not “figuring things out” as it runs. It is simply following instructions.
It does not decide which index to use.
It does not decide join order.
It does not try to make your query faster.

All of that happens before execution even begins.

This is where SQLite’s frontend comes in.

This is the part of the system that:

  • Takes your SQL and actually understands it
  • Breaks it down into structure
  • Validates it
  • Rewrites it
  • Optimizes it
  • And finally turns it into the bytecode you saw earlier

In other words, if the last post showed you how SQLite runs queries, this one is about how SQLite decides what to run.

And that’s where things get really interesting.

What Is the SQLite Frontend?

SQLite is split into two parts:

  • Frontend → understands and compiles your query
  • Backend (VDBE) → executes the compiled program

The frontend acts like a compiler for SQL. When you run a query, it doesn’t execute immediately. Instead, SQLite first converts it into a bytecode program, which the Virtual Machine then runs.

You can actually see this if you run:

EXPLAIN SELECT * FROM users;

image

What you’re looking at is not SQL anymore, it’s the compiled output of the frontend.

Internally, this entire process is triggered by:

sqlite3_prepare()

Which takes your SQL and turns it into a bytecode program stored in a Vdbe object.

image

So SQLite doesn’t “run SQL” directly. It compiles SQL first, then executes the result.

What Is the Frontend Actually Doing?

At a high level, SQLite behaves less like a traditional database and more like a compiler.

When you write SQL, SQLite doesn’t interpret it directly. It transforms it step by step into a low-level program.

Here’s what that transformation looks like:

image

Everything up to bytecode generation is where the real work happens. Execution is just the final step.

How Does SQLite Even Understand Your Query?

Before SQLite can optimize anything, it first needs to understand what you wrote.

That starts with breaking your SQL into pieces.

The Tokenizer: Turning Text Into Meaningful Units

When SQLite sees a query like:

SELECT name FROM users;

it doesn’t immediately understand it as a query. It first breaks it down into tokens like SELECT, name, FROM, and users.

A Token represents a single unit from the SQL input. It carries the actual text value, such as a literal, table name, or column name. The tokenizer produces these, and the parser consumes them.

The tokenizer classifies tokens into categories like:

  • KeywordsSELECT, FROM, WHERE
  • Identifiers → table names, column names (users, name)
  • Literals'hello', 123
  • Symbols(, ), =, ;

The tokenizer output looks like this:

Figure 8.1: A typical tokenizer output.

At this stage, SQLite has structure, but no meaning yet.

The Parser: Building a Structure SQLite Can Reason About

The parser takes those tokens and organizes them into a structured representation called a parse tree.

So a query like:

SELECT name FROM users WHERE age > 25;

is transformed into something that clearly separates:

  • what to select
  • where to get it from
  • what condition to apply

The parser creates a tree like:

image

This is also where validation happens. If your SQL is malformed or references a table that doesn’t exist, this is where SQLite will reject it.

By the end of this step, SQLite fully understands your query not as text, but as a structured plan.

How Does That Become Something Executable?

Understanding the query is only half the job. SQLite now needs to convert that structure into something it can execute efficiently.

The Code Generator: Turning Structure Into Instructions

Instead of evaluating the parse tree directly, SQLite generates a sequence of low-level instructions which is its own internal bytecode.

This is a key design decision.

Rather than repeatedly interpreting complex structures, SQLite compiles your query into a small program and then runs that program.

Internally, these instructions are stored in a structure called the Vdbe (Virtual Database Engine).

Where the WHERE Clause Becomes Real Work

Consider a simple query:

SELECT * FROM users WHERE age > 25;

At a conceptual level, this becomes:

For each row in the users table, check if the condition is true, and if it is, return the row.

That logic is implemented as a loop in bytecode.

SQLite generates instructions to iterate through rows, evaluate conditions, and produce results.

This is also where performance starts to matter.

The way this loop is constructed determines how many rows are scanned and how quickly results are produced.

image

How Does SQLite Know What You Meant?

Before generating final instructions, SQLite must resolve every name in your query.

When you write:

SELECT name FROM users;

SQLite needs to figure out exactly what name refers to.

Is it a column in users, or something else?

This process is called name resolution, and it becomes especially important in nested queries.

What Happens in Subqueries?

In a query like:

SELECT name FROM users 
WHERE id IN (
    SELECT user_id FROM orders
);

image

SQLite creates separate contexts for the inner and outer queries.

It first tries to resolve names within the inner query.

If it can’t, it looks outward. If the inner query depends on values from the outer query, it becomes a correlated subquery, which must be executed repeatedly.

If not, it can be executed once and reused.

This distinction has a huge impact on performance.

Where Does Performance Actually Come From?

Now we reach the most important part of the entire pipeline.

The Optimizer: SQLite’s Decision-Making Engine

Given a query, there are often multiple valid ways to execute it.
image

For example:

SELECT * FROM users WHERE age = 25;

SQLite could scan every row in the table and check the condition, or it could use an index to jump directly to matching rows.

Both approaches produce the same result, but one is dramatically faster.

The optimizer is responsible for choosing the better one.

Common Optimization Techniques

  • Index selection
  • Query rewriting
  • Constant folding (precomputing values)
  • Join reordering
  • Removing redundant conditions

What Is SQLite Really Trying to Minimize?

The biggest cost in database systems is not computation, it’s reading data from disk.

So the optimizer’s main goal is simple:

Reduce the number of rows that need to be read.

Everything else follows from that.

The Real Mental Model

At this point, the picture should be clear.

SQLite is not just running SQL.

It is compiling SQL into a program.

The frontend:

  • Understands your query
  • Rewrites it
  • Optimizes it
  • Generates bytecode

The Virtual Machine:

  • Executes instructions
  • Makes no decisions

All the intelligence lives in the frontend.

What You Should Take Away

If there’s one thing to remember, it’s this:

The performance of your query is decided before it runs.

The structure of your WHERE clause, the order of your joins, and the indexes available all influence the program SQLite generates.

Understanding this pipeline doesn’t just help you understand SQLite. It helps you write better queries, design better schemas, and avoid performance surprises.

And once you see it this way, SQL stops being “just a query language.”

It becomes something much closer to a programming language that gets compiled under the hood.

git-lrc

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc