SQLite FTS5: Full-Text Search Extension
FTS5 (Full-Text Search version 5) is a virtual table module that creates inverted indexes for efficient text searching. Unlike regular SQLite tables that store data in B-trees, FTS5 maintains...
Key Insights
- SQLite’s FTS5 extension provides production-ready full-text search with BM25 ranking, prefix queries, and phrase matching without external dependencies
- FTS5 virtual tables operate alongside regular SQLite tables, enabling hybrid queries that combine full-text search with relational operations in a single transaction
- Proper tokenizer selection and auxiliary functions unlock advanced features like snippet generation, result highlighting, and custom ranking algorithms
Understanding FTS5 Architecture
FTS5 (Full-Text Search version 5) is a virtual table module that creates inverted indexes for efficient text searching. Unlike regular SQLite tables that store data in B-trees, FTS5 maintains separate data structures optimized for text retrieval.
-- Enable FTS5 extension (built into most SQLite distributions)
CREATE VIRTUAL TABLE documents USING fts5(
title,
content,
author,
tokenize = 'porter ascii'
);
-- Insert sample data
INSERT INTO documents (title, content, author) VALUES
('Database Indexing', 'B-tree indexes provide logarithmic lookup times', 'Smith'),
('Search Algorithms', 'Full-text search requires inverted indexes', 'Jones'),
('SQLite Performance', 'Proper indexing dramatically improves query speed', 'Smith');
The tokenize parameter specifies how text is broken into searchable tokens. The porter tokenizer applies Porter stemming, treating “indexing”, “indexes”, and “indexed” as equivalent terms.
Basic Search Operations
FTS5 uses the MATCH operator for full-text queries. The syntax supports boolean operators, phrase searches, and column filters.
-- Simple term search
SELECT title, content FROM documents WHERE documents MATCH 'index';
-- Boolean operators
SELECT title FROM documents WHERE documents MATCH 'index AND performance';
SELECT title FROM documents WHERE documents MATCH 'index OR search';
SELECT title FROM documents WHERE documents MATCH 'index NOT btree';
-- Phrase search (exact order)
SELECT title FROM documents WHERE documents MATCH '"inverted indexes"';
-- Column-specific search
SELECT title FROM documents WHERE documents MATCH 'author:Smith';
-- Prefix matching
SELECT title FROM documents WHERE documents MATCH 'perform*';
The MATCH operator returns rows ranked by relevance using BM25 by default. Results appear in descending order of relevance score.
Ranking and Relevance
FTS5 exposes the rank column for accessing BM25 scores. You can also implement custom ranking functions using auxiliary functions.
-- Default BM25 ranking
SELECT title, rank FROM documents
WHERE documents MATCH 'index'
ORDER BY rank;
-- Custom ranking with bm25() function
SELECT title, bm25(documents, 10.0, 5.0) as score
FROM documents
WHERE documents MATCH 'database'
ORDER BY score;
-- Combining full-text and traditional filters
SELECT d.title, d.rank
FROM documents d
WHERE d.author = 'Smith'
AND documents MATCH 'index'
ORDER BY d.rank;
The bm25() function accepts weight parameters for each column, allowing you to prioritize title matches over content matches:
-- Weight title 3x more than content
SELECT title, bm25(documents, 3.0, 1.0) as score
FROM documents
WHERE documents MATCH 'search'
ORDER BY score;
Snippet Generation and Highlighting
FTS5 provides built-in functions for generating contextual snippets and highlighting matching terms.
-- Generate snippets with highlighted matches
SELECT
title,
snippet(documents, 1, '<b>', '</b>', '...', 15) as excerpt
FROM documents
WHERE documents MATCH 'index'
ORDER BY rank;
-- The snippet() function parameters:
-- 1: Column index (0=title, 1=content, 2=author)
-- '<b>', '</b>': Start and end highlight markers
-- '...': Ellipsis for truncated text
-- 15: Approximate token count for snippet
For highlighting all matching terms in complete text:
SELECT
highlight(documents, 0, '<mark>', '</mark>') as highlighted_title,
highlight(documents, 1, '<mark>', '</mark>') as highlighted_content
FROM documents
WHERE documents MATCH 'performance OR index';
Advanced Tokenizer Configuration
Choosing the right tokenizer significantly impacts search behavior. FTS5 supports multiple tokenizers with different characteristics.
-- ASCII tokenizer (case-insensitive, removes diacritics)
CREATE VIRTUAL TABLE docs_ascii USING fts5(
content,
tokenize = 'ascii'
);
-- Unicode61 tokenizer (proper Unicode support)
CREATE VIRTUAL TABLE docs_unicode USING fts5(
content,
tokenize = 'unicode61 remove_diacritics 2'
);
-- Porter stemming for English
CREATE VIRTUAL TABLE docs_stemmed USING fts5(
content,
tokenize = 'porter ascii'
);
-- Trigram tokenizer for substring matching
CREATE VIRTUAL TABLE docs_trigram USING fts5(
content,
tokenize = 'trigram'
);
Trigram tokenization enables substring searches without prefix operators:
-- With trigram tokenizer, find "form" within "performance"
INSERT INTO docs_trigram VALUES ('SQLite performance tuning');
SELECT * FROM docs_trigram WHERE docs_trigram MATCH 'form';
-- Returns the row because "performance" contains "form"
External Content Tables
For large datasets, you can separate FTS5 indexes from actual content storage using external content tables.
-- Regular table with content
CREATE TABLE articles (
id INTEGER PRIMARY KEY,
title TEXT,
body TEXT,
created_at INTEGER
);
-- FTS5 index referencing external content
CREATE VIRTUAL TABLE articles_fts USING fts5(
title,
body,
content='articles',
content_rowid='id'
);
-- Triggers to keep FTS index synchronized
CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN
INSERT INTO articles_fts(rowid, title, body)
VALUES (new.id, new.title, new.body);
END;
CREATE TRIGGER articles_ad AFTER DELETE ON articles BEGIN
INSERT INTO articles_fts(articles_fts, rowid, title, body)
VALUES ('delete', old.id, old.title, old.body);
END;
CREATE TRIGGER articles_au AFTER UPDATE ON articles BEGIN
INSERT INTO articles_fts(articles_fts, rowid, title, body)
VALUES ('delete', old.id, old.title, old.body);
INSERT INTO articles_fts(rowid, title, body)
VALUES (new.id, new.title, new.body);
END;
This approach reduces storage overhead since content isn’t duplicated in the FTS index.
Query Optimization Techniques
FTS5 provides the fts5vocab table for analyzing indexed terms and optimizing queries.
-- Create vocabulary table
CREATE VIRTUAL TABLE documents_vocab USING fts5vocab('documents', 'row');
-- Find most common terms
SELECT term, cnt FROM documents_vocab
ORDER BY cnt DESC LIMIT 10;
-- Identify terms by document frequency
SELECT term, doc FROM documents_vocab
WHERE doc > 5;
For complex applications, combine FTS5 with regular indexes:
CREATE TABLE posts (
id INTEGER PRIMARY KEY,
category_id INTEGER,
title TEXT,
content TEXT,
published_at INTEGER
);
CREATE INDEX idx_posts_category ON posts(category_id);
CREATE INDEX idx_posts_published ON posts(published_at);
CREATE VIRTUAL TABLE posts_fts USING fts5(
title, content,
content='posts',
content_rowid='id'
);
-- Efficient hybrid query
SELECT p.id, p.title, p.published_at
FROM posts p
JOIN posts_fts fts ON p.id = fts.rowid
WHERE p.category_id = 5
AND p.published_at > 1640000000
AND posts_fts MATCH 'database optimization'
ORDER BY fts.rank
LIMIT 20;
Practical Implementation Patterns
Here’s a complete example implementing a document search system with pagination and faceting:
-- Schema setup
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
category TEXT NOT NULL,
tags TEXT,
created_at INTEGER NOT NULL
);
CREATE VIRTUAL TABLE documents_fts USING fts5(
title, content, tags,
content='documents',
content_rowid='id',
tokenize='porter unicode61'
);
-- Search with pagination
WITH search_results AS (
SELECT
d.id,
d.title,
d.category,
fts.rank,
snippet(documents_fts, 1, '**', '**', '...', 20) as excerpt
FROM documents d
JOIN documents_fts fts ON d.id = fts.rowid
WHERE documents_fts MATCH ? -- Bind search query
ORDER BY fts.rank
LIMIT 20 OFFSET ? -- Bind offset for pagination
)
SELECT * FROM search_results;
-- Category faceting
SELECT
d.category,
COUNT(*) as count
FROM documents d
JOIN documents_fts fts ON d.id = fts.rowid
WHERE documents_fts MATCH ?
GROUP BY d.category
ORDER BY count DESC;
FTS5 delivers enterprise-grade full-text search capabilities within SQLite’s zero-configuration architecture. By understanding tokenizers, ranking functions, and hybrid query patterns, you can build sophisticated search features without external dependencies or infrastructure complexity.