Time-Series Databases: InfluxDB and TimescaleDB
Time-series data is any dataset where each record includes a timestamp indicating when an event occurred or a measurement was taken. Unlike traditional database workloads with random access patterns,...
Key Insights
- InfluxDB offers superior write performance and simpler operations for pure time-series workloads, while TimescaleDB excels when you need full SQL capabilities and complex joins with relational data.
- TimescaleDB’s hypertable architecture automatically partitions data into chunks, providing time-series optimizations while maintaining PostgreSQL compatibility—eliminating the need to learn a new query language.
- For production deployments, InfluxDB requires its proprietary clustering solution (commercial), whereas TimescaleDB can leverage the entire PostgreSQL ecosystem for replication, backup, and high availability.
Introduction to Time-Series Data
Time-series data is any dataset where each record includes a timestamp indicating when an event occurred or a measurement was taken. Unlike traditional database workloads with random access patterns, time-series data arrives sequentially and is typically queried in time-range windows.
Common use cases include IoT sensor networks collecting temperature and humidity readings every second, application performance monitoring capturing metrics like request latency and CPU usage, financial systems tracking stock prices, and infrastructure monitoring solutions gathering system statistics.
Traditional relational databases struggle with time-series workloads for several reasons. First, the write volume overwhelms standard B-tree indexes—inserting millions of timestamped records per second creates index maintenance overhead. Second, range queries across time windows perform poorly without specialized indexing strategies. Third, data retention becomes problematic as historical data accumulates; manually partitioning and dropping old data requires complex maintenance scripts.
Time-series databases solve these problems through specialized storage engines, automatic partitioning by time, built-in downsampling and aggregation functions, and efficient compression algorithms that exploit the sequential nature of time-series data.
InfluxDB: Purpose-Built Time-Series Database
InfluxDB was designed from the ground up for time-series workloads. Its architecture uses a custom storage engine that organizes data into measurements (similar to tables), with each point containing tags (indexed metadata), fields (actual values), and a timestamp.
Tags are indexed and used for filtering—think of them as dimensions you’ll query by (server name, region, sensor ID). Fields contain the actual measurements and aren’t indexed. This distinction is crucial for performance.
InfluxDB originally used InfluxQL (SQL-like) but now emphasizes Flux, a functional scripting language designed for data transformation and analysis.
Here’s how to write sensor data using the Python client:
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
client = InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org")
write_api = client.write_api(write_options=SYNCHRONOUS)
# Write temperature readings from multiple sensors
point = Point("temperature") \
.tag("sensor_id", "sensor_001") \
.tag("location", "warehouse_a") \
.field("value", 23.5) \
.field("humidity", 65.2) \
.time(datetime.utcnow())
write_api.write(bucket="sensors", record=point)
Querying with Flux provides powerful aggregation capabilities:
from(bucket: "sensors")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "temperature")
|> filter(fn: (r) => r.location == "warehouse_a")
|> aggregateWindow(every: 5m, fn: mean)
|> yield(name: "mean_temperature")
Retention policies automatically expire old data, crucial for managing storage:
// Create a retention policy that keeps data for 30 days
// This is done via the InfluxDB API or UI
// Older data is automatically deleted
InfluxDB handles retention policies at the bucket level in version 2.x, simplifying configuration compared to the older database/retention policy model.
TimescaleDB: PostgreSQL Extension Approach
TimescaleDB takes a different approach: it extends PostgreSQL with time-series optimizations while maintaining full SQL compatibility. This means you get time-series performance without abandoning the PostgreSQL ecosystem.
The core concept is the hypertable—a virtual table that automatically partitions data into chunks based on time intervals. Under the hood, TimescaleDB creates regular PostgreSQL tables for each chunk, but you interact with the hypertable as a single table.
Converting a regular table to a hypertable is straightforward:
-- Create a regular table
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id TEXT NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
-- Convert to hypertable (must be done before inserting data)
SELECT create_hypertable('sensor_data', 'time');
-- Create indexes on commonly queried columns
CREATE INDEX ON sensor_data (sensor_id, time DESC);
CREATE INDEX ON sensor_data (location, time DESC);
Inserting data uses standard SQL:
INSERT INTO sensor_data (time, sensor_id, location, temperature, humidity)
VALUES
(NOW(), 'sensor_001', 'warehouse_a', 23.5, 65.2),
(NOW(), 'sensor_002', 'warehouse_b', 22.1, 58.7);
The time_bucket() function is TimescaleDB’s killer feature for time-series queries:
-- Calculate 5-minute average temperature by location
SELECT
time_bucket('5 minutes', time) AS bucket,
location,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, location
ORDER BY bucket DESC;
Continuous aggregates are materialized views that automatically update as new data arrives:
-- Create a continuous aggregate for hourly averages
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
sensor_id,
location,
AVG(temperature) AS avg_temperature,
AVG(humidity) AS avg_humidity
FROM sensor_data
GROUP BY bucket, sensor_id, location;
-- Add a refresh policy to update every 30 minutes
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
start_offset => INTERVAL '2 hours',
end_offset => INTERVAL '30 minutes',
schedule_interval => INTERVAL '30 minutes');
Performance Comparison
Write performance heavily favors InfluxDB for pure time-series ingestion. In benchmarks with sequential writes, InfluxDB can handle 500k-1M points per second on modern hardware, while TimescaleDB typically achieves 100k-300k rows per second. However, TimescaleDB’s performance improves significantly with batched inserts.
Here’s a simple benchmark script:
import time
from influxdb_client import InfluxDBClient, Point
import psycopg2
# InfluxDB write test
def benchmark_influx(points_count):
client = InfluxDBClient(url="http://localhost:8086", token="token", org="org")
write_api = client.write_api()
start = time.time()
points = []
for i in range(points_count):
points.append(Point("benchmark").field("value", i))
write_api.write(bucket="test", record=points)
return time.time() - start
# TimescaleDB write test
def benchmark_timescale(points_count):
conn = psycopg2.connect("postgresql://localhost/test")
cur = conn.cursor()
start = time.time()
values = [(f"NOW() + INTERVAL '{i} seconds'", i) for i in range(points_count)]
cur.executemany("INSERT INTO benchmark (time, value) VALUES (%s, %s)", values)
conn.commit()
return time.time() - start
For query performance, TimescaleDB often wins on complex analytical queries involving JOINs or window functions, leveraging PostgreSQL’s sophisticated query planner. InfluxDB excels at simple aggregations and downsampling operations.
Both databases offer excellent compression. InfluxDB’s TSM (Time-Structured Merge tree) engine achieves 10-20x compression ratios. TimescaleDB’s native compression (using columnar storage) achieves similar ratios while maintaining query performance.
When to Choose Each Database
Choose InfluxDB when:
- Your workload is purely time-series data with minimal relational requirements
- You need maximum write throughput for metrics or IoT data
- You want operational simplicity with a single-purpose database
- You’re building a monitoring or observability platform
- Your team prefers a managed cloud solution (InfluxDB Cloud)
Choose TimescaleDB when:
- You need to join time-series data with relational data (users, products, configurations)
- Your team already knows PostgreSQL and SQL
- You want to leverage the PostgreSQL ecosystem (extensions, tools, backup solutions)
- You need complex analytical queries with CTEs and window functions
- You’re running mixed workloads in the same database
- You want to avoid vendor lock-in with open-source PostgreSQL
Production Deployment Considerations
For InfluxDB, the open-source version runs as a single node. High availability requires InfluxDB Enterprise (commercial) or InfluxDB Cloud. Backups are performed using the influx backup command or by backing up the underlying data directory.
Here’s a Docker Compose setup for InfluxDB:
version: '3.8'
services:
influxdb:
image: influxdb:2.7
ports:
- "8086:8086"
volumes:
- influxdb-data:/var/lib/influxdb2
- influxdb-config:/etc/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=admin
- DOCKER_INFLUXDB_INIT_PASSWORD=password
- DOCKER_INFLUXDB_INIT_ORG=myorg
- DOCKER_INFLUXDB_INIT_BUCKET=mybucket
volumes:
influxdb-data:
influxdb-config:
TimescaleDB leverages PostgreSQL’s replication capabilities for high availability:
version: '3.8'
services:
timescaledb:
image: timescale/timescaledb:latest-pg15
ports:
- "5432:5432"
volumes:
- timescaledb-data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=password
- POSTGRES_DB=tsdb
volumes:
timescaledb-data:
Both databases integrate seamlessly with Grafana for visualization:
# Grafana datasource configuration for TimescaleDB
apiVersion: 1
datasources:
- name: TimescaleDB
type: postgres
url: timescaledb:5432
database: tsdb
user: postgres
jsonData:
sslmode: disable
timescaledb: true
For production monitoring, InfluxDB exposes metrics on /metrics for Prometheus scraping. TimescaleDB uses standard PostgreSQL monitoring tools like pg_stat_statements and can be monitored with postgres_exporter.
The choice between InfluxDB and TimescaleDB ultimately depends on your specific requirements, existing infrastructure, and team expertise. Both are production-ready solutions that excel in different scenarios.