PostgreSQL Extensions: PostGIS, pg_trgm, hstore

PostgreSQL's extension system is one of its most powerful features, allowing you to add specialized functionality without modifying the core database engine. Extensions package new data types,...

Key Insights

  • PostgreSQL extensions transform the database into a specialized tool: PostGIS adds geospatial capabilities rivaling dedicated GIS systems, pg_trgm enables fuzzy text search without external services, and hstore provides flexible key-value storage for semi-structured data.
  • Proper indexing is critical—GiST indexes for PostGIS geometries, GIN indexes for pg_trgm text searches, and GiST indexes for hstore queries can improve performance by orders of magnitude.
  • These extensions solve real production problems: PostGIS for location-based services, pg_trgm for typo-tolerant search and autocomplete, and hstore for schema flexibility without JSON’s parsing overhead.

Introduction to PostgreSQL Extensions

PostgreSQL’s extension system is one of its most powerful features, allowing you to add specialized functionality without modifying the core database engine. Extensions package new data types, functions, operators, and index types into installable modules that integrate seamlessly with PostgreSQL’s query planner and execution engine.

The three extensions we’ll cover represent different categories of problems you’ll encounter in production systems. PostGIS turns PostgreSQL into a full-featured geographic database. pg_trgm solves fuzzy text matching and similarity search problems. hstore provides a lightweight key-value store for semi-structured data that predates PostgreSQL’s native JSON support and still has specific advantages.

Installing extensions is straightforward:

CREATE EXTENSION postgis;
CREATE EXTENSION pg_trgm;
CREATE EXTENSION hstore;

You need superuser privileges or the appropriate permissions. Once installed, the extension’s features are available to all databases where you’ve created them.

PostGIS: Spatial and Geographic Data

PostGIS is the industry-standard solution for storing and querying geographic data in relational databases. It adds support for geometric and geographic objects, spatial indexing, and hundreds of functions for spatial analysis.

The fundamental distinction in PostGIS is between geometry (planar coordinates) and geography (spherical coordinates on Earth). Use geography for real-world locations and geometry for projected data or when you need the full range of PostGIS functions.

Here’s a practical example—storing restaurant locations:

CREATE TABLE restaurants (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),  -- WGS 84 coordinate system
    address TEXT
);

-- Insert some locations (longitude, latitude)
INSERT INTO restaurants (name, location, address) VALUES
    ('Pizza Palace', ST_GeogFromText('POINT(-73.9857 40.7484)'), '123 Broadway, NYC'),
    ('Burger Barn', ST_GeogFromText('POINT(-73.9712 40.7831)'), '456 Amsterdam Ave, NYC'),
    ('Taco Town', ST_GeogFromText('POINT(-74.0060 40.7128)'), '789 Wall St, NYC');

The most common query is finding all points within a radius. Here’s how to find restaurants within 2 kilometers of a location:

SELECT 
    name,
    ST_Distance(location, ST_GeogFromText('POINT(-73.9857 40.7589)')) / 1000 AS distance_km
FROM restaurants
WHERE ST_DWithin(
    location,
    ST_GeogFromText('POINT(-73.9857 40.7589)'),
    2000  -- 2000 meters
)
ORDER BY distance_km;

ST_DWithin is optimized for this use case and works with spatial indexes. Always use it instead of calculating distances and filtering manually.

Spatial indexes are essential for performance:

CREATE INDEX idx_restaurants_location ON restaurants USING GIST(location);

This GiST index enables efficient spatial queries. Without it, PostGIS must check every row. With it, queries on millions of points return in milliseconds.

For calculating exact distances between two points:

SELECT ST_Distance(
    ST_GeogFromText('POINT(-73.9857 40.7484)'),
    ST_GeogFromText('POINT(-74.0060 40.7128)')
) / 1000 AS distance_km;
-- Returns approximately 2.68 km

pg_trgm: Fuzzy Text Search and Similarity Matching

The pg_trgm extension implements trigram matching for text similarity. A trigram is a group of three consecutive characters. The extension compares trigram sets between strings to calculate similarity scores and enable fuzzy searching.

This is invaluable for user-facing search features where exact matching fails due to typos, variations in spelling, or partial inputs.

Here’s a customer database with fuzzy name search:

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

INSERT INTO customers (name, email) VALUES
    ('John Smith', 'john@example.com'),
    ('Jon Smythe', 'jon@example.com'),
    ('Jane Smith', 'jane@example.com'),
    ('Jonathan Smithson', 'jonathan@example.com');

The similarity() function returns a score between 0 and 1:

SELECT 
    name,
    similarity(name, 'John Smith') AS score
FROM customers
WHERE similarity(name, 'John Smith') > 0.3
ORDER BY score DESC;

Results:

name                | score
--------------------+-------
John Smith          | 1.0
Jon Smythe          | 0.5
Jonathan Smithson   | 0.46

The % operator is shorthand for similarity threshold matching (default threshold 0.3):

SELECT name FROM customers WHERE name % 'Jon Smyth';
-- Returns: John Smith, Jon Smythe, Jonathan Smithson

For production use, create a GIN or GiST index:

CREATE INDEX idx_customers_name_trgm ON customers USING GIN(name gin_trgm_ops);

GIN indexes are larger but faster for static data. GiST indexes are smaller and better for frequently updated tables.

This enables blazingly fast LIKE queries with wildcards:

-- Without pg_trgm: slow sequential scan
-- With pg_trgm + GIN index: uses index scan
SELECT name FROM customers WHERE name ILIKE '%smith%';

For autocomplete functionality:

SELECT name 
FROM customers 
WHERE name ILIKE 'joh%'
ORDER BY similarity(name, 'joh') DESC
LIMIT 5;

You can adjust the similarity threshold globally:

SET pg_trgm.similarity_threshold = 0.5;

hstore: Key-Value Storage Within PostgreSQL

hstore provides a key-value data type that stores sets of key-value pairs within a single column. It’s more efficient than JSON for simple key-value storage and supports indexing that JSON didn’t originally have.

Use hstore when you need schema flexibility but don’t need nested structures. It’s perfect for user preferences, feature flags, or metadata that varies between rows.

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    attributes hstore
);

INSERT INTO products (name, attributes) VALUES
    ('Laptop', 'brand=>Dell, ram=>16GB, cpu=>Intel i7, color=>black'),
    ('Phone', 'brand=>Apple, storage=>256GB, color=>silver'),
    ('Tablet', 'brand=>Samsung, storage=>128GB, screen=>10.1inch');

Query specific keys with the -> operator:

SELECT name, attributes->'brand' AS brand 
FROM products 
WHERE attributes->'color' = 'black';

Check for key existence with the ? operator:

SELECT name FROM products WHERE attributes ? 'ram';
-- Returns: Laptop

Update individual keys without replacing the entire hstore:

UPDATE products 
SET attributes = attributes || 'warranty=>2years'::hstore
WHERE name = 'Laptop';

For queries on hstore values, create a GiST index:

CREATE INDEX idx_products_attributes ON products USING GIST(attributes);

This accelerates containment queries:

SELECT name FROM products 
WHERE attributes @> 'brand=>Apple'::hstore;

Convert between hstore and JSON:

-- hstore to JSON
SELECT hstore_to_json(attributes) FROM products;

-- JSON to hstore (requires jsonb)
SELECT hstore(jsonb_each(
    '{"brand": "Sony", "price": "299"}'::jsonb
));

hstore is more efficient than JSON for simple key-value pairs because it’s stored in a binary format optimized for key lookup. However, it only supports string values and doesn’t handle nested structures.

Performance Considerations and Best Practices

Each extension requires specific indexing strategies. Here’s a performance comparison showing the impact of proper indexing:

-- Create test table with 100,000 locations
CREATE TABLE locations_test (
    id SERIAL PRIMARY KEY,
    point GEOGRAPHY(POINT, 4326)
);

-- Without index
EXPLAIN ANALYZE
SELECT COUNT(*) FROM locations_test
WHERE ST_DWithin(point, ST_GeogFromText('POINT(-73.9857 40.7589)'), 5000);
-- Execution time: ~850ms (sequential scan)

-- With GiST index
CREATE INDEX idx_locations_test_point ON locations_test USING GIST(point);

EXPLAIN ANALYZE
SELECT COUNT(*) FROM locations_test
WHERE ST_DWithin(point, ST_GeogFromText('POINT(-73.9857 40.7589)'), 5000);
-- Execution time: ~12ms (index scan)

PostGIS best practices:

  • Always use ST_DWithin instead of ST_Distance with a WHERE clause
  • Choose geography for real-world coordinates, geometry for performance-critical planar operations
  • Create GiST indexes on all spatial columns used in WHERE clauses
  • Use appropriate SRID (Spatial Reference System Identifier) consistently

pg_trgm best practices:

  • GIN indexes for mostly-read workloads, GiST for frequently updated tables
  • Adjust pg_trgm.similarity_threshold based on your data characteristics
  • Combine with full-text search for comprehensive text querying
  • Use ILIKE with leading wildcards only when you have a trigram index

hstore best practices:

  • Prefer JSONB for nested structures or when you need array values
  • Use hstore for simple key-value metadata where performance matters
  • Create GiST indexes when querying by keys or containment
  • Consider partitioning tables with large hstore columns

Real-World Use Cases

These extensions often work together. Here’s a restaurant discovery app that combines PostGIS and pg_trgm:

CREATE TABLE venues (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),
    metadata hstore
);

CREATE INDEX idx_venues_location ON venues USING GIST(location);
CREATE INDEX idx_venues_name_trgm ON venues USING GIN(name gin_trgm_ops);
CREATE INDEX idx_venues_metadata ON venues USING GIST(metadata);

-- Find venues with fuzzy name matching within a geographic area
SELECT 
    name,
    ST_Distance(location, ST_GeogFromText('POINT(-73.9857 40.7589)')) / 1000 AS distance_km,
    similarity(name, 'pizza') AS name_match
FROM venues
WHERE 
    ST_DWithin(location, ST_GeogFromText('POINT(-73.9857 40.7589)'), 3000)
    AND name % 'pizza'
    AND metadata @> 'open=>true'::hstore
ORDER BY distance_km, name_match DESC;

This single query demonstrates geographic filtering, fuzzy text search, and key-value metadata querying—all executing efficiently with proper indexes.

Conclusion

PostgreSQL extensions eliminate the need for specialized databases in many scenarios. PostGIS handles geospatial workloads that would otherwise require dedicated GIS systems. pg_trgm provides fuzzy search capabilities without Elasticsearch or Solr. hstore offers schema flexibility for semi-structured data.

Start by identifying which problems these extensions solve in your application. Install them in a development environment and experiment with the examples above. Pay special attention to indexing—the performance difference between indexed and non-indexed queries is dramatic.

The combination of these extensions with PostgreSQL’s reliability, ACID compliance, and rich ecosystem makes it a compelling choice for applications that would otherwise require multiple specialized databases. You get spatial queries, fuzzy search, and flexible schemas in a single, proven database system.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.