Geohashing: Location-Based Indexing
Geohashing is a spatial indexing system that encodes geographic coordinates into short alphanumeric strings. Invented by Gustavo Niemeyer in 2008, it transforms a two-dimensional location problem...
Key Insights
- Geohashing converts latitude/longitude pairs into hierarchical strings where shared prefixes indicate spatial proximity, enabling efficient database indexing with standard B-tree indexes
- The “edge problem” means adjacent locations can have completely different geohash prefixes, requiring neighbor cell queries for accurate proximity searches
- Geohashing excels for simple proximity queries with existing infrastructure, but consider S2 or H3 for applications requiring uniform cell sizes or complex geometric operations
Introduction to Geohashing
Geohashing is a spatial indexing system that encodes geographic coordinates into short alphanumeric strings. Invented by Gustavo Niemeyer in 2008, it transforms a two-dimensional location problem into a one-dimensional string comparison problem—something databases handle exceptionally well.
When you encode (37.7749, -122.4194) (San Francisco), you get 9q8yyk. Nearby locations produce similar strings: 9q8yym, 9q8yyj. This property makes geohashing invaluable for location-based applications where you need to quickly find nearby points without expensive distance calculations on every row.
The real power lies in its simplicity. You don’t need PostGIS or specialized spatial databases. A standard B-tree index on a VARCHAR column gives you efficient proximity queries. That’s why geohashing remains popular despite newer alternatives.
How Geohashing Works
The algorithm recursively bisects the world into smaller rectangles. Start with the entire coordinate space: latitude [-90, 90] and longitude [-180, 180]. For each bit, divide the current range in half. If the coordinate falls in the upper half, output 1; otherwise, output 0.
Longitude bits and latitude bits are interleaved, creating a Z-order curve that preserves some spatial locality. Finally, every 5 bits are encoded as a base32 character.
BASE32 = '0123456789bcdefghjkmnpqrstuvwxyz'
def encode_geohash(lat: float, lon: float, precision: int = 12) -> str:
"""Encode latitude/longitude to geohash string."""
lat_range = (-90.0, 90.0)
lon_range = (-180.0, 180.0)
geohash = []
bits = 0
bit_count = 0
is_longitude = True
while len(geohash) < precision:
if is_longitude:
mid = (lon_range[0] + lon_range[1]) / 2
if lon >= mid:
bits = (bits << 1) | 1
lon_range = (mid, lon_range[1])
else:
bits = bits << 1
lon_range = (lon_range[0], mid)
else:
mid = (lat_range[0] + lat_range[1]) / 2
if lat >= mid:
bits = (bits << 1) | 1
lat_range = (mid, lat_range[1])
else:
bits = bits << 1
lat_range = (lat_range[0], mid)
is_longitude = not is_longitude
bit_count += 1
if bit_count == 5:
geohash.append(BASE32[bits])
bits = 0
bit_count = 0
return ''.join(geohash)
def decode_geohash(geohash: str) -> tuple[float, float]:
"""Decode geohash string to latitude/longitude center point."""
lat_range = [-90.0, 90.0]
lon_range = [-180.0, 180.0]
is_longitude = True
for char in geohash:
idx = BASE32.index(char)
for i in range(4, -1, -1):
bit = (idx >> i) & 1
if is_longitude:
mid = (lon_range[0] + lon_range[1]) / 2
lon_range[bit == 0] = mid
else:
mid = (lat_range[0] + lat_range[1]) / 2
lat_range[bit == 0] = mid
is_longitude = not is_longitude
lat = (lat_range[0] + lat_range[1]) / 2
lon = (lon_range[0] + lon_range[1]) / 2
return (lat, lon)
Precision determines grid cell size. Here’s what each level gives you:
| Precision | Cell Width | Cell Height | Use Case |
|---|---|---|---|
| 4 | ~39 km | ~20 km | Regional grouping |
| 5 | ~5 km | ~5 km | City-level |
| 6 | ~1.2 km | ~600 m | Neighborhood |
| 7 | ~150 m | ~150 m | Block-level |
| 8 | ~38 m | ~19 m | Building |
Key Properties and Trade-offs
Geohashing’s killer feature is prefix-based proximity. Points sharing longer prefixes are closer together. This enables range queries: WHERE geohash LIKE '9q8yy%' finds everything in that cell.
But there’s a critical gotcha: the edge problem. Two points millimeters apart can have completely different geohash prefixes if they straddle a cell boundary.
def demonstrate_edge_problem():
"""Show how adjacent points can have different prefixes."""
# Two points very close together, but on different sides of a boundary
point_a = (37.7750, -122.4180)
point_b = (37.7750, -122.4181)
hash_a = encode_geohash(*point_a, precision=6)
hash_b = encode_geohash(*point_b, precision=6)
print(f"Point A: {point_a} -> {hash_a}")
print(f"Point B: {point_b} -> {hash_b}")
print(f"Distance: ~11 meters")
print(f"Shared prefix length: {shared_prefix_length(hash_a, hash_b)}")
# Visualize the grid around these points
visualize_boundary(point_a, point_b)
def shared_prefix_length(a: str, b: str) -> int:
"""Count shared prefix characters."""
for i, (ca, cb) in enumerate(zip(a, b)):
if ca != cb:
return i
return min(len(a), len(b))
def visualize_boundary(point_a, point_b):
"""ASCII visualization of boundary edge case."""
print("\nGrid visualization (precision=6):")
print("┌─────────┬─────────┐")
print("│ 9q8yyk │ 9q8yym │")
print("│ │ A │")
print("├─────────┼─────────┤")
print("│ 9q8yyh │ 9q8yyj │")
print("│ B │ │")
print("└─────────┴─────────┘")
print("A and B are 11m apart but share NO prefix at precision 6!")
demonstrate_edge_problem()
This isn’t a bug—it’s fundamental to how geohashing works. Any grid-based system has boundaries. The solution is to always query neighboring cells.
Querying with Geohashes
For accurate proximity searches, you must query the target cell plus its 8 neighbors. Here’s how to compute them:
NEIGHBORS = {
'n': ('p0r21436x8zb9dcf5h7kjnmqesgutwvy', 'bc01fg45telemet89telemet'),
's': ('14365h7k9dcfesgujnmqp0r2twvyx8zb', '238967debc01fg45telemet'),
'e': ('bc01fg45238967deuvhjyznpkmstqrwx', 'p0r21436x8zb9dcf5h7kjnmqesgutwvy'),
'w': ('238967debc01fg45kmstqrwxuvhjyznp', '14365h7k9dcfesgujnmqp0r2twvyx8zb'),
}
BORDERS = {
'n': ('prxz', 'bcfguvyz'),
's': ('028b', '0145telemet'),
'e': ('bcfguvyz', 'prxz'),
'w': ('0145telemet', '028b'),
}
def get_neighbor(geohash: str, direction: str) -> str:
"""Get adjacent geohash cell in given direction."""
if not geohash:
return ''
last_char = geohash[-1]
parent = geohash[:-1]
char_type = len(geohash) % 2 # 0 = even, 1 = odd
if last_char in BORDERS[direction][char_type]:
parent = get_neighbor(parent, direction)
return parent + BASE32[NEIGHBORS[direction][char_type].index(last_char)]
def get_all_neighbors(geohash: str) -> dict[str, str]:
"""Get all 8 neighboring geohash cells."""
return {
'n': get_neighbor(geohash, 'n'),
's': get_neighbor(geohash, 's'),
'e': get_neighbor(geohash, 'e'),
'w': get_neighbor(geohash, 'w'),
'ne': get_neighbor(get_neighbor(geohash, 'n'), 'e'),
'nw': get_neighbor(get_neighbor(geohash, 'n'), 'w'),
'se': get_neighbor(get_neighbor(geohash, 's'), 'e'),
'sw': get_neighbor(get_neighbor(geohash, 's'), 'w'),
}
def radius_search(center_lat: float, center_lon: float,
radius_km: float, precision: int = 6) -> list[str]:
"""Get all geohash cells that could contain points within radius."""
center_hash = encode_geohash(center_lat, center_lon, precision)
cells = {center_hash}
# For small radii relative to cell size, neighbors suffice
# For larger radii, expand iteratively
neighbors = get_all_neighbors(center_hash)
cells.update(neighbors.values())
# For very large radii, you'd expand further
# This is simplified—production code should calculate
# how many cell-widths the radius spans
return list(cells)
Database Integration Patterns
Geohashes shine when integrated with standard database indexes. Here’s a PostgreSQL setup:
CREATE TABLE locations (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
lat DECIMAL(10, 7) NOT NULL,
lon DECIMAL(10, 7) NOT NULL,
geohash VARCHAR(12) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- B-tree index enables fast prefix searches
CREATE INDEX idx_locations_geohash ON locations (geohash);
-- Partial indexes for common precision levels
CREATE INDEX idx_locations_geohash_6 ON locations (LEFT(geohash, 6));
-- Find locations in a cell and its neighbors
SELECT * FROM locations
WHERE geohash LIKE '9q8yyk%'
OR geohash LIKE '9q8yym%'
OR geohash LIKE '9q8yyj%'
OR geohash LIKE '9q8yyh%'
-- ... include all 9 cells
ORDER BY geohash;
-- More efficient with ANY
SELECT * FROM locations
WHERE LEFT(geohash, 6) = ANY(ARRAY['9q8yyk', '9q8yym', '9q8yyj', '9q8yyh']);
For Redis, sorted sets with lexicographic ordering work beautifully:
import redis
r = redis.Redis()
def add_location(name: str, lat: float, lon: float):
geohash = encode_geohash(lat, lon, precision=12)
# Score of 0 means ZRANGEBYLEX works on member ordering
r.zadd('locations', {f'{geohash}:{name}': 0})
def find_in_cell(geohash_prefix: str) -> list[str]:
"""Find all locations in a geohash cell."""
# ZRANGEBYLEX finds members lexicographically between bounds
return r.zrangebylex(
'locations',
f'[{geohash_prefix}',
f'[{geohash_prefix}\xff'
)
def find_nearby(lat: float, lon: float, precision: int = 6) -> list[str]:
"""Find locations in cell and neighbors."""
center = encode_geohash(lat, lon, precision)
cells = [center] + list(get_all_neighbors(center).values())
results = []
for cell in cells:
results.extend(find_in_cell(cell))
return results
Real-World Use Cases
Ride-sharing driver matching: Uber famously used geohashing early on. When a rider requests a pickup, query drivers in the rider’s cell plus neighbors. Filter by actual distance only on the small result set.
Food delivery radius: DoorDash-style apps use geohashing to quickly find restaurants within delivery range. Store restaurant geohashes at precision 6-7, query cells covering the delivery radius.
Social proximity features: “Friends nearby” features query user locations by geohash prefix. Precision 5-6 gives city-block granularity without exposing exact locations.
Location-based caching: Cache API responses by geohash prefix. Weather data for 9q8yy* serves everyone in that ~1km cell.
Limitations and Alternatives
Geohashing has real limitations. The edge problem requires neighbor queries. Cells aren’t uniform in size (they’re taller near poles). Complex geometric queries (polygon intersection, arbitrary shapes) require post-filtering.
When to use alternatives:
- PostGIS/R-trees: When you need complex spatial queries (polygons, intersections) and can afford the infrastructure
- S2 Geometry: When you need uniform cell sizes globally (Google’s choice for Maps)
- H3: When you need hexagonal cells (better for radius queries, used by Uber now)
Geohashing wins when you need simplicity, have existing infrastructure, and your queries are primarily “find things near this point.” It’s the right tool when you want location indexing without adopting a spatial database.
For most applications starting out with location features, geohashing provides 80% of the value with 20% of the complexity. Start here, migrate to S2 or H3 when you hit its limits.