PostgreSQL Full-Text Search: tsvector and tsquery

Key Insights

PostgreSQL’s full-text search is fast enough for most applications up to millions of documents, eliminating the need for external search infrastructure in many cases.
The tsvector type stores preprocessed, searchable document representations while tsquery defines search patterns—understanding both is essential for effective implementation.
GIN indexes on tsvector columns combined with generated columns provide automatic, performant search with minimal maintenance overhead.

Most developers reach for Elasticsearch or Algolia when they need search functionality, but PostgreSQL’s built-in full-text search capabilities are surprisingly powerful. For applications with up to several million documents, PostgreSQL can deliver sub-50ms search queries while keeping your infrastructure simple. You avoid the operational complexity of maintaining a separate search cluster, the data synchronization headaches, and the additional costs.

The sweet spot for PostgreSQL full-text search is content-heavy applications like blogs, documentation sites, product catalogs, and internal knowledge bases. If you need sub-10ms response times, complex relevance tuning, or are searching billions of documents, then yes, reach for a dedicated search engine. For everything else, PostgreSQL is probably sufficient.

The Problem with LIKE Queries

Before diving into full-text search, let’s see why simple pattern matching fails at scale:

-- Slow and inflexible
SELECT * FROM articles 
WHERE title LIKE '%postgres%' OR content LIKE '%postgres%';

This query can’t use indexes effectively, requires exact substring matches, and gets slower as your table grows. It also misses variations like “PostgreSQL” or “Postgres” unless you add more conditions.

Full-text search solves these problems through linguistic normalization and specialized indexing:

-- Fast and intelligent
SELECT * FROM articles 
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'postgres');

This finds “postgres”, “PostgreSQL”, “Postgres’s” and ranks results by relevance—all while using an index.

Understanding tsvector: Tokenized Documents

The tsvector type is PostgreSQL’s internal representation of a searchable document. It stores normalized lexemes (base forms of words) with positional information, stripping out stop words and applying language-specific stemming.

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');

-- Result:
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Notice what happened: “foxes” became “fox”, “jumped” became “jump”, “lazy” became “lazi”, and common words like “the” and “over” were removed. The numbers represent word positions in the original text.

Language configuration matters significantly:

-- English configuration applies stemming
SELECT to_tsvector('english', 'running runs ran');
-- Result: 'ran':3 'run':1,2

-- Simple configuration preserves words as-is
SELECT to_tsvector('simple', 'running runs ran');
-- Result: 'ran':3 'running':1 'runs':2

Choose ’english’ for English text to get intelligent stemming. Use ‘simple’ for product codes, usernames, or mixed-language content where stemming might cause problems.

Understanding tsquery: Search Patterns

While tsvector represents documents, tsquery represents search queries with boolean operators and phrase matching:

-- Basic query
SELECT to_tsquery('english', 'postgres & database');
-- Matches documents containing both "postgres" AND "database"

-- OR operator
SELECT to_tsquery('english', 'postgres | mysql');
-- Matches documents with either term

-- NOT operator
SELECT to_tsquery('english', 'postgres & !oracle');
-- Matches "postgres" but excludes documents mentioning "oracle"

-- Phrase search with position
SELECT to_tsquery('english', 'full <-> text <-> search');
-- Matches "full text search" as a phrase

For user-facing search, to_tsquery() is too strict—it requires proper boolean syntax. Use plainto_tsquery() or websearch_to_tsquery() instead:

-- User types: "postgres full text"
SELECT plainto_tsquery('english', 'postgres full text');
-- Result: 'postgres' & 'full' & 'text'

-- User types: "postgres OR mysql -oracle"
SELECT websearch_to_tsquery('english', 'postgres OR mysql -oracle');
-- Result: 'postgres' | 'mysql' & !'oracl'

The websearch_to_tsquery() function supports Google-like syntax: OR for alternatives, quotes for phrases, and minus for exclusions. This is what you want for search boxes.

Performing Searches with the @@ Operator

The @@ operator matches a tsvector against a tsquery:

CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);

-- Basic search
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || content) 
      @@ plainto_tsquery('english', 'postgresql performance');

This works but recalculates the tsvector for every row. We’ll optimize this shortly. First, let’s add ranking:

SELECT 
    title,
    ts_rank(to_tsvector('english', title || ' ' || content), 
            plainto_tsquery('english', 'postgresql performance')) AS rank
FROM articles
WHERE to_tsvector('english', title || ' ' || content) 
      @@ plainto_tsquery('english', 'postgresql performance')
ORDER BY rank DESC
LIMIT 10;

The ts_rank() function scores results based on term frequency and position. Use ts_rank_cd() for cover density ranking, which considers how close search terms appear together.

Highlight matching terms in results with ts_headline():

SELECT 
    title,
    ts_headline('english', content, 
                plainto_tsquery('english', 'postgresql performance'),
                'StartSel=<mark>, StopSel=</mark>, MaxWords=50') AS snippet
FROM articles
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'postgresql performance')
LIMIT 10;

This generates excerpts with matched terms wrapped in <mark> tags, perfect for search result previews.

Indexing for Performance

Computing tsvector on every query is expensive. Store it in a dedicated column and index it:

-- Add tsvector column
ALTER TABLE articles ADD COLUMN content_search tsvector
    GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

-- Create GIN index
CREATE INDEX articles_search_idx ON articles USING GIN(content_search);

Generated columns automatically update when source columns change. Now queries are fast:

SELECT title, ts_rank(content_search, query) AS rank
FROM articles, plainto_tsquery('english', 'postgresql performance') query
WHERE content_search @@ query
ORDER BY rank DESC
LIMIT 10;

Check the performance difference:

EXPLAIN ANALYZE
SELECT title FROM articles
WHERE content_search @@ plainto_tsquery('english', 'postgresql');

You should see “Bitmap Index Scan on articles_search_idx” with execution times in milliseconds, even on tables with hundreds of thousands of rows.

If you’re on PostgreSQL versions before 12 (which added generated columns), use a trigger instead:

CREATE FUNCTION articles_search_trigger() RETURNS trigger AS $$
BEGIN
    NEW.content_search := to_tsvector('english', NEW.title || ' ' || NEW.content);
    RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_update 
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();

Real-World Implementation

Here’s a complete product search implementation:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    tags TEXT[],
    price DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Add searchable column combining multiple fields
ALTER TABLE products ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(array_to_string(tags, ' '), '')), 'D')
    ) STORED;

CREATE INDEX products_search_idx ON products USING GIN(search_vector);

The setweight() function assigns importance: ‘A’ for titles, ‘B’ for descriptions, etc. This affects ranking.

Create a search function:

CREATE OR REPLACE FUNCTION search_products(search_query TEXT, max_results INT DEFAULT 20)
RETURNS TABLE (
    id INT,
    name TEXT,
    description TEXT,
    price DECIMAL,
    rank REAL,
    headline TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        p.id,
        p.name,
        p.description,
        p.price,
        ts_rank(p.search_vector, query) AS rank,
        ts_headline('english', p.description, query, 
                    'MaxWords=30, MinWords=15, StartSel=<b>, StopSel=</b>') AS headline
    FROM products p,
         websearch_to_tsquery('english', search_query) query
    WHERE p.search_vector @@ query
    ORDER BY rank DESC
    LIMIT max_results;
END;
$$ LANGUAGE plpgsql;

-- Use it
SELECT * FROM search_products('laptop gaming');

Best Practices and Gotchas

Choose the right text configuration. Use ’english’ for English text, but be aware it stems aggressively. For product SKUs, IDs, or technical terms, use ‘simple’ or create a custom configuration.

Handle NULL values. Always use coalesce() when combining columns:

-- Bad: NULLs break concatenation
to_tsvector('english', title || ' ' || description)

-- Good: Handles NULLs gracefully
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, ''))

Don’t over-index. GIN indexes are large—typically 40-60% of your table size. Only index columns you actually search.

Sanitize user input for websearch_to_tsquery. While it’s safer than to_tsquery(), extremely long or complex queries can still cause performance issues:

-- Limit query length and complexity
CREATE OR REPLACE FUNCTION sanitize_search(query TEXT) RETURNS TEXT AS $$
BEGIN
    RETURN LEFT(TRIM(query), 200);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Monitor query performance. Some queries are inherently expensive (very common terms, complex boolean logic). Set statement_timeout to prevent runaway queries:

SET statement_timeout = '5s';

PostgreSQL full-text search isn’t a silver bullet, but for most applications, it’s the pragmatic choice. You get powerful search capabilities without the operational overhead of external systems. Start here, and only reach for dedicated search engines when you’ve actually outgrown PostgreSQL’s capabilities—which might be never.