Learn How to Navigate Code Structures and Extract Details Using Tree-sitter

Unlock the power to explore and extract any part of your code effortlessly with Tree-sitter

Learn How to Navigate Code Structures and Extract Details Using Tree-sitter

If you’ve ever wished you could query code like data, Tree-sitter might be your new best friend.

Whether you're building a code analysis tool, editor extension, or just exploring syntax trees—this guide will help you understand Tree-sitter Queries from scratch using real Python examples. Let’s dive in!


What Is Tree-sitter?

Tree-sitter is a parser generator and runtime for building fast, accurate parsers for programming languages. It's used in editors like Neovim, Zed, and VS Code extensions for:

  • Syntax highlighting
  • Structural editing
  • Code navigation
  • Language-aware tools

Tree-sitter Queries

A Tree-sitter query is a way to search through this syntax tree to find specific code patterns. Think of it like a super-powered search tool that not only looks for words but understands the structure of the code.

Let’s say we’re analyzing the following Python code using Tree-sitter:

from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import generics, serializers
from django.contrib.auth.models import User

class UserView(APIView):
    def get(self, request):
 user_id = request.GET.get('id')
        if user_id:
            return Response({"user_id": user_id})
        return Response({"error": "User ID missing"}, status=400)

    def post(self, request):
 data = request.data
 username = data.get('username')
        return Response({"username": username"})

class UserSerializer(serializers.ModelSerializer):
    class Meta:
 model = User
 fields = ['id', 'username', 'email']

class UserDetailUpdateView(generics.RetrieveUpdateAPIView):
 queryset = User.objects.all()
 serializer_class = UserSerializer
 lookup_field = 'pk'

We will go through various Tree Sitter queries to match parts of this code, so lets begin.

You can practice using tree sitter from the tree sitter playground. We will be using that here for the demo.

Every tree sitter query is composed of nodes. Lets go through some of the Node types first.

Node Types

Every piece of code is represented as a node in the syntax tree.

Some Examples (Python):

  • identifier
  • call
  • string
  • assignment
  • parameters
  • argument_list
  • return_statement
  • attribute
  • if_statement

identifier

An identifier is a name that the programmer gives to things like variables, functions, classes, or parameters.

(identifier) @var-name


string

In Tree-sitter, a (string) node represents a string literal in the source code — i.e., any value enclosed in quotation marks, like "hello" or 'world'.

(string) @string-val


call

In Tree-sitter, a (call) node represents a function call — when a function is being invoked/executed in the code.

(call
 function: (identifier) @called-func)


assignment

In Tree-sitter, an (assignment) node represents an assignment statement, where a value is stored in a variable.

(assignment
 left: (identifier) @left-var
 right: (_) @right-value)


parameters

In Tree-sitter, a (parameters) node represents the list of parameters that a function accepts.

This query captures each (identifier) inside the parameter list and tags it as @param-name.

(parameters
 (identifier) @param-name)


argument_list

In Tree-sitter, an (argument_list) node represents the list of arguments passed to a function when it's being called.

(argument_list
 (string) @arg)


return_statement

(return_statement) @return-line


attribute

In Tree-sitter, a (return_statement) node represents a return statement in a function — used to send a value back to the caller.

(attribute
 object: (identifier) @object
 attribute: (identifier) @prop)


if_statement

In Tree-sitter, an (if_statement) node captures the structure of an if block in a language like Python

(if_statement
 condition: (_) @cond
 consequence: (_) @if-body)

Named vs Anonymous Nodes

Named nodes are meaningful parts of the code defined by the grammar, like function calls, variable names, or statements.

Anonymous nodes are just syntax symbols or punctuation like =, (, ), or commas — they don’t have special names in the grammar.

Node Type Description Examples
Named Grammar-defined call, identifier, return_statement
Anonymous Just syntax characters '=', '(', ')', ','

Logical Operators in Tree-sitter Queries

Logical operators help you choose exactly what you want when searching code with Tree-sitter.

Think of them like filters — they check if the thing you found matches or doesn't match certain words or patterns.

You write them with a # before the word, and they work on parts of the code you already found.

1. #match?: Regex match

(function_definition
 name: (identifier) @func-name
 (#match? @func-name "^get"))

This matches any function whose name starts with get, such as get_user, getData, etc.

In the output, you can observe that the matched text is highlighted as blue.


2. #eq?: Exact match

(function_definition
 name: (identifier) @func-name
 (#eq? @func-name "post"))

Matches only if the function is exactly post.


3. #not-eq?: Not equal

(function_definition
 name: (identifier) @func-name
 (#not-eq? @func-name "post"))

Matches everything except post.


4. #any-of?: Match multiple strings

(#any-of? @func-name "get" "post" "put")

Matches if the name is any one of the listed strings.


5. #not-match?: Regex inverse

(function_definition
 name: (identifier) @func-name
 (#not-match? @func-name "^post"))

Matches functions that do not start with post.


Wildcards

A wildcard is represented by an underscore _.

(return_statement (_) @return-value)

Use this when you don’t care about the exact node type.


Deeply Nested Structures

Suppose we want to extract "get('id')" from:

class UserView(APIView):
 def get(self,request):
 request.GET.get('id')

We can use this query:

(call
 function:
 (attribute
 object:
 (attribute
 object: (identifier) @base
 attribute: (identifier) @mid)
 attribute: (identifier) @method-name))

Let's break it down:

  • (call ...)

    • Matches a function call — the whole thing where a function is being invoked.
    • request.GET.get("id")
  • function: (attribute ...)

    • Inside the call, the thing being called is an attribute access (like something.somethingElse).
    • Matches the part before the parentheses — the attribute chain that is being called as a function:
    • request.GET.get
  • object: (attribute ...)

    • This attribute itself is made by accessing another attribute of an object.
    • So, it's like object.attribute inside another object.attribute (nested).
    • Matches the attribute inside that attribute — the part before the last dot:
    • request.GET
  • object: (identifier) @base

    • The base object is an identifier — here, request. This is tagged as @base.
    • Matches the base identifier — the very first word:
    • request
  • attribute: (identifier) @mid

    • The middle attribute — here, GET. Tagged as @mid.
    • GET
  • attribute: (identifier) @method-name

    • Finally, the last attribute, which is the method being called — here, get. Tagged as @method-name.
    • get
  • @baserequest
  • @midGET
  • @method-nameget

This shows how Tree-sitter lets you deeply analyze structure, step by step.

Conclusion

Tree-sitter Queries let you interact with code as structured data. You can match, extract, and transform code patterns easily, powering things like Code navigation tools and Custom linters

Once you understand the query system, it’s like having X-ray vision over your codebase.

Happy parsing!

LiveAPI: Super-Convenient API Docs That Always Stay Up-To-Date

Many internal services lack documentation, or the docs drift from the code. Even with effort, customer-facing API docs can be hard to use. With LiveAPI, connect your Git repository to automatically generate interactive API docs with clear descriptions, references, and "try it" editors.

Read more