search-engine

We Chose Meilisearch Over 10+ Other Search Engines Despite a Major Drawback

After reviewing over 10 search engines, we selected Meilisearch despite its major drawback in PostgreSQL database synchronization. In this post, we share our experience and explain how we resolved this issue using Golang and pgx.

Lince Mathew

Jul 28, 2024 — 8 min read

Is It Worth Investing Resources in a Third-Party Search Engine? Here Are Our Reasons

We are continuously working on improving our product Feedback by Hexmos day by day for the upcoming release.

New features and pages are coming up, the UI is changing, bugs are being noticed and fixed, and many changes are happening in the product. As the product grows, we realize we need to improve the navigation across the product.

We already have a sidebar and a client-side search package cmdk to navigate to different screens, but difficulties arise when we want to search for different user profiles, teams, team performance, etc., which forces us to integrate a better third party search engine for Feedback.

Another reason for a dedicated search engine is that we have other products in the chain such as FeedZap, which requires a complex text search operations in future.

Considering this, we are planning to put effort into implementing a dedicated, powerful search engine that adapts to our use cases and resource availability.

How to Choose the Right Search Engine that fit your Needs

There are a lot of search engines available, including open-source search engines, serverless, server-based, etc.
Before diving in to figure out the right one, it's always better to do an analysis of your requirements and infrastructure, including present and future needs.

For some products, searchable data are minimal but require a decent search feature with minimal operation, yet can't afford a dedicated server.
For other products, the dataset is larger, requires additional complex search operations and have enough resources to load a dedicated search engine.

Based on this, I reviewed a few popular search engines.

Need Decent Performance, Dataset Is Small, and Can't Afford a Server

PostgreSQL Full-Text Search

If you are using PostgreSQL and don't want to maintain any other index-based database, then PostgreSQL Full-Text Search (PSFTS) is a good option. However, it is not recommended for large use cases where you deal with millions of transactions and extensive data management.
User Review from Hacker News

Bleve

Bleve is another option to consider if your project is within the Go ecosystem. It is recommended if you can't rely on powerful server-based search engine services. Here is the benchmark report on Bleve.

Tantivy

Tantivy is written in Rust and is particularly useful for Rust-based projects. It has received numerous positive feedbacks and is a good option to consider.

Need Powerful Performance, Large Dataset, and Can Afford a Server

If you own a server or cloud instance and require a powerful, scalable search engine with full control, then a server-based option is the way to go.

Our considerations and requirements led us to choose a server-based search engine. We have enough resources to host it, and it is better than serverless options for

Long-term use
Scalability
Additional support for complex search operations such as:
- Facet search: it means when shopping online, you might search for "laptops" and then use facet search to narrow down results by selecting filters like "price under $1000," "brand: Apple," and "RAM: 16GB."
- Multisearch: Consider travel website which might let users search for flights, hotels, and car rentals all at once and display back the integrated results.
- Search-as-you-type: It provides real-time search results based on each key stroke.
Common search system for mulitiple products.

After extensive filtering, we narrowed it down to four options in this category such as:

Here is a comparison between them:

Criteria	meiliSearch	Typesense	Pisa Search	Manticore
Search-as-you-type	yes	yes	No	yes
facet search	yes	yes	No	yes
multiple schema/product support	yes	yes	-	yes
RAM usage	for 224 MB disk:~305 MB RAM prmary index location is disk	primary index location is RAM, for 100MB disk requires 300MB RAM	-	-
CPU Usage	for 12 core machine it uses maximum 6 core github issues related to high cpu usage	for 4vCPU handle 104 concurrent search/seconds	-	-
typo, synonyms handling	yes	yes	-	-

We filtered out PISA Search because it does not offer search-as-you-type and facet search features, which are required for our application.

Why We Chose Meilisearch Over Typesense or Manticore

Compared to Typesense, Meilisearch stores the search index data on disk and moves it to RAM when required. In contrast, Typesense stores the entire search index in memory and uses RocksDB to store documents that may not be required for indexing on disk.

Users suggest that Meilisearch is suitable for both document and customer databases. In our case, the data, once stored for indexing, does not change frequently, unlike an e-commerce database where updates to prices, reviews, etc., are frequent.

AI Support: The Meilisearch community is growing rapidly, and in the future, we can leverage its AI features for various applications in our AI products.

Is Setting Up Meilisearch a Headache?

The installation and configuration of Meilisearch are pretty straightforward.
The RAM and storage requirements of your server depend on the size of the data that needs to be indexed and the search speed you require.
We hosted it on an AWS EC2 instance with 4 GB of RAM and 30 GB of storage, and it works perfectly for our use case. Its taking only 41.6 mb RAM.

here is a detailed explanation of how to setup meilisearch binary or docker in your server.

The Actual Syncing Pain Begins

Once the configuration is complete, the next crucial step is loading and synchronizing the index database with the project database. We were using a PostgreSQL database and initially wanted to sync the user table of our Feedback App.

Syncing with postgresql what meilisearch suggest is using a python tool called meilisync.

here is the yaml configuration file of meilisync looks like:

debug: true
progress:
  type: file
source:
  type: postgres
  host: 127.0.0.1
  port: 5432
  user: test user
  password: "password"
  database: test database
meilisearch:
  api_url: http://localhost:7700
  api_key: *********
  insert_size: 1
sync:
  - table: user
    index: users
    full: true
    pk: id
    fields:
      id:
      first_name:
      email:
      last_name:
      organization:
      is_deleted:

From the yaml file we can connect our postgresql database and meilisearch instance. Also we need to do some operations in the postgres also to support syncing such as enabling wal2json extension an some configuration changes in postgresql.conf file.

This Is Why I Threw Out Meilisync

I tried to configure Meilisync and connect it with our database. Unfortunately, I encountered a lot of configuration and setup issues. The most complex part was the real-time synchronization, which means syncing the index with real-time data from PostgreSQL was not working. I tried several fixes on the PostgreSQL side to resolve it, but after many attempts, the real-time syncing still did not work.

Overcoming the Synchronization Drawback Using Golang

To overcome the PostgreSQL database synchronization issue in Meilisearch, we created an alternate solution called 'SyncSearch' using Golang and the pgxtool database toolkit package available in Golang.

Our main goals for this tool are:

Real-time syncing for insertion, update, and deletion actions in the PostgreSQL database.
Ensuring the database stays synchronized with the Meilisearch index table most of the time, even if the tool is down and restarted later.

Here is the flow of real-time synchronization:

Never Miss a Single Data Event: Keep the Sync Logs

I created a database table to keep the last sync time of postgresql database with meilisearch, this will helps to sync if the tool stopped or error out in production.

func SyncUpdatedRows(db *sql.DB, pool *pgxpool.Pool, indexName string, syncType string) error {
	lastSyncedTime, err := util.GetLastSyncedTime(db, syncType)
	if err != nil {
		return err
	}

	updatedRows, err := GetUpdatedRows(db, lastSyncedTime)
	fmt.Printf("Found %d updated rows\n", len(updatedRows))
	if err != nil {
		return err
	}

	for _, user := range updatedRows {
		doc := map[string]interface{}{
			"id":           user.ID,
			"first_name":   user.FirstName,
			"email":        user.Email,
			"last_name":    user.LastName,
			"organization": user.Organization,
			"is_deleted":   user.IsDeleted,
		}
		err := meilisearch.UpdateDocument(indexName, doc)
		if err != nil {
			log.Printf("Failed to update document in MeiliSearch: %v", err)
		}
	}

	// Update the last synced time
	newTime := time.Now().UTC()
	err = util.UpdateLastSyncedTime(db, syncType, newTime)
	if err != nil {
		return err
	}

	return nil
}

The SyncUpdatedRows function retrieves rows updated since the last sync from a database, updates the corresponding documents in MeiliSearch, and then updates the last synced time in the database. It handles errors at each step and logs any issues encountered during the document update process.

PostgreSQL Triggers and Functions for Events

One obstacle I faced while using Meilisync was that it was not connecting to or listening for changes from PostgreSQL. I tried setting up the extension and followed some Stack Overflow fixes, but it didn't work. In Search Sync, I set up a postgreSQL trigger and function that triggers a notification when operations happen in the table.

CREATE OR REPLACE FUNCTION notify_user_changes() RETURNS TRIGGER AS $$
	BEGIN
	PERFORM pg_notify('user_changes', row_to_json(NEW)::text);
	RETURN NEW;
	END;
	$$ LANGUAGE plpgsql;
	

DROP TRIGGER IF EXISTS user_changes_trigger ON users;
CREATE TRIGGER user_changes_trigger
AFTER INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION notify_user_changes();

Pgxtool Listeners for Listening to the Notification Channel

Pgxtool is one popular postgreSQL toolkit package available in Golang.

for {
    conn, err := pool.Acquire(context.Background())
    if err != nil {
            return fmt.Errorf("failed to acquire connection: %v", err)
        }
        defer conn.Release()

        _, err = conn.Exec(context.Background(), "LISTEN user_changes;")
        if err != nil {
            return fmt.Errorf("failed to subscribe to notifications: %v", err)
        }
        ctx, cancel := context.WithTimeout(context.Background(), 1*time.Hour)
		notification, err := conn.Conn().WaitForNotification(ctx)
        err = processNotification(notification.Payload, indexName, db, syncType)
 }

The ListenForUserChanges function continuously listens for user changes by calling listenForChanges, and if an error occurs, it logs the error and restarts the listener. This ensures that the listener remains active and resilient to interruptions.
conn.Conn().WaitForNotification(ctx) look for notifications from the trigger and if notification triggers processNotification function triggers and data will upsert into the meilisearch index.

Is Meilisearch Really Fast?

All the setup is almost done; now the real testing begins. We have around 1,000 users in our user database. Let's check how much time it takes for Meilisearch to fetch the users whose names start with 'A'.

It only took 1ms to filter 195 results from 1,000 users.

Here is a demo of it in action:

Before Meilisearch:

After Adding User Profile Search Using Meilisearch:

What More Can We Do with Meilisearch?

Other than normal text search, the Meilisearch search engine supports facet search and multi-search, which are very useful in Feedback App for searching by team, user performance attributes, and more.

It also support:

Multi-Language Support: Support to handle search queries in different languages.
Typo Tolerance: Improve user experience by enabling typo tolerance.
Geo-Search: allows users to filter and sort search results by geographic location or distance.
AI Search Support: Melisearch also support vector, semantic,contextual search if you are into AI later.