Database Connection Pooling: PgBouncer and ProxySQL
Every database connection carries overhead. When your application creates a new connection, the database must authenticate the user, allocate memory buffers, initialize session variables, and...
Key Insights
- Connection pooling reduces database load by reusing connections instead of creating new ones for each query—PgBouncer can reduce PostgreSQL connection overhead by 80-95% in high-concurrency scenarios
- PgBouncer excels at lightweight PostgreSQL connection pooling with minimal overhead, while ProxySQL offers advanced MySQL routing, query caching, and read/write splitting capabilities
- Transaction-level pooling provides the best balance of connection efficiency and application compatibility, but requires understanding prepared statement limitations and connection state management
The Connection Problem
Every database connection carries overhead. When your application creates a new connection, the database must authenticate the user, allocate memory buffers, initialize session variables, and establish network state. For PostgreSQL, this means forking a new backend process. For MySQL, it means spawning a new thread and allocating per-connection buffers.
Here’s what this looks like in practice:
import time
import psycopg2
# Measure connection creation overhead
start = time.time()
conn = psycopg2.connect(
host="localhost",
database="myapp",
user="appuser",
password="secret"
)
connection_time = time.time() - start
# Measure simple query execution
cursor = conn.cursor()
start = time.time()
cursor.execute("SELECT 1")
query_time = time.time() - start
print(f"Connection time: {connection_time*1000:.2f}ms")
print(f"Query time: {query_time*1000:.2f}ms")
# Typical output: Connection time: 45-80ms, Query time: 0.5-2ms
The connection overhead often exceeds query execution time by 20-100x for simple queries. When you’re handling hundreds of requests per second, this becomes a bottleneck. Connection pooling solves this by maintaining a pool of persistent connections that applications can borrow and return.
Understanding Connection Pooling Fundamentals
Connection pooling operates on a simple principle: reuse connections instead of creating new ones. However, the devil is in the details—specifically, how much connection state you preserve between uses.
There are three main pooling modes:
Session pooling: A client gets a dedicated connection for their entire session. This is the safest mode but provides minimal benefits—it’s essentially just limiting connection count.
Transaction pooling: A connection is returned to the pool after each transaction completes. This is the sweet spot for most applications, offering significant efficiency gains while maintaining transaction boundaries.
Statement pooling: Connections are returned after each statement. This provides maximum efficiency but breaks most applications that rely on multi-statement transactions, temporary tables, or prepared statements.
Here’s the lifecycle in pseudocode:
class ConnectionPool:
def __init__(self, size):
self.available = [create_connection() for _ in range(size)]
self.in_use = set()
def acquire(self):
if not self.available:
wait_or_create_new()
conn = self.available.pop()
self.in_use.add(conn)
return conn
def release(self, conn):
if pool_mode == "transaction":
execute("ROLLBACK") # Clean up any uncommitted work
reset_connection_state(conn)
self.in_use.remove(conn)
self.available.append(conn)
PgBouncer: PostgreSQL Connection Pooling
PgBouncer is a lightweight connection pooler specifically designed for PostgreSQL. It sits between your application and PostgreSQL, multiplexing client connections onto a smaller pool of database connections.
Installation is straightforward:
# Ubuntu/Debian
apt-get install pgbouncer
# macOS
brew install pgbouncer
Here’s a production-ready configuration (/etc/pgbouncer/pgbouncer.ini):
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Pool configuration
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
# Performance tuning
max_db_connections = 50
max_user_connections = 50
# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
The critical settings:
pool_mode = transaction: Best balance of efficiency and compatibilitydefault_pool_size = 25: Number of server connections per user/database pairmax_client_conn = 1000: Maximum client connections PgBouncer will acceptreserve_pool_size: Emergency connections when the pool is exhausted
Your application connects to PgBouncer instead of PostgreSQL directly:
# Python with psycopg2
import psycopg2
from psycopg2.pool import SimpleConnectionPool
# Connect to PgBouncer (port 6432) instead of PostgreSQL (5432)
conn = psycopg2.connect(
host="localhost",
port=6432, # PgBouncer port
database="myapp",
user="appuser",
password="secret"
)
// Node.js with pg
const { Pool } = require('pg');
const pool = new Pool({
host: 'localhost',
port: 6432, // PgBouncer port
database: 'myapp',
user: 'appuser',
password: 'secret',
max: 20 // Application-level pool size
});
Monitor PgBouncer through its admin console:
-- Connect to pgbouncer admin
psql -p 6432 -U pgbouncer pgbouncer
-- Show pool statistics
SHOW POOLS;
-- Returns: database | user | cl_active | cl_waiting | sv_active | sv_idle
-- Show per-database stats
SHOW STATS;
-- Returns: total_xact_count, total_query_count, avg_query_time
-- Show current connections
SHOW CLIENTS;
SHOW SERVERS;
ProxySQL: MySQL Connection Pooling and Beyond
ProxySQL is more than a connection pooler—it’s a proxy layer that provides query routing, caching, and traffic shaping for MySQL and its variants (MariaDB, Percona Server).
Installation:
# Ubuntu/Debian
wget https://github.com/sysown/proxysql/releases/download/v2.5.5/proxysql_2.5.5-ubuntu22_amd64.deb
dpkg -i proxysql_2.5.5-ubuntu22_amd64.deb
systemctl start proxysql
ProxySQL uses a multi-layer configuration system. Connect to the admin interface:
mysql -u admin -padmin -h 127.0.0.1 -P6032
Configure backend MySQL servers and connection pooling:
-- Add MySQL backend servers
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (0, '10.0.1.10', 3306); -- Writer
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '10.0.1.11', 3306); -- Reader 1
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '10.0.1.12', 3306); -- Reader 2
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
-- Configure connection pooling
SET mysql-max_connections=2000;
SET mysql-default_max_connections=1000;
SET mysql-connection_max_age_ms=3600000;
-- Configure read/write split
INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 0); -- Writes to hostgroup 0
INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup)
VALUES (2, 1, '^SELECT', 1); -- Reads to hostgroup 1
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
-- Add application users
INSERT INTO mysql_users(username, password, default_hostgroup)
VALUES ('appuser', 'secret', 0);
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
Application code connects to ProxySQL:
# Python with mysql-connector
import mysql.connector
conn = mysql.connector.connect(
host='localhost',
port=6033, # ProxySQL port
user='appuser',
password='secret',
database='myapp'
)
# ProxySQL automatically routes this to a read replica
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (123,))
# This goes to the primary
cursor.execute("UPDATE users SET last_login = NOW() WHERE id = %s", (123,))
Performance Comparison and Use Cases
PgBouncer and ProxySQL serve different ecosystems, but their performance characteristics inform when to use each.
PgBouncer strengths:
- Extremely lightweight (single-threaded, minimal memory footprint)
- Sub-millisecond overhead per query
- Handles 10,000+ client connections with ease
- Perfect for microservices with many short-lived connections
ProxySQL strengths:
- Advanced query routing and load balancing
- Query caching reduces database load
- Built-in failover and health checking
- Better for complex topologies with replicas
Here’s a load test comparing direct connections vs. pooled:
import time
import psycopg2
from concurrent.futures import ThreadPoolExecutor
def run_query(connection_params, query):
conn = psycopg2.connect(**connection_params)
cursor = conn.cursor()
cursor.execute(query)
cursor.fetchall()
conn.close()
# Test direct PostgreSQL connection
direct_params = {'host': 'localhost', 'port': 5432, 'database': 'myapp'}
start = time.time()
with ThreadPoolExecutor(max_workers=100) as executor:
futures = [executor.submit(run_query, direct_params, "SELECT 1")
for _ in range(1000)]
[f.result() for f in futures]
direct_time = time.time() - start
# Test PgBouncer connection
pooled_params = {'host': 'localhost', 'port': 6432, 'database': 'myapp'}
start = time.time()
with ThreadPoolExecutor(max_workers=100) as executor:
futures = [executor.submit(run_query, pooled_params, "SELECT 1")
for _ in range(1000)]
[f.result() for f in futures]
pooled_time = time.time() - start
print(f"Direct: {direct_time:.2f}s, Pooled: {pooled_time:.2f}s")
print(f"Improvement: {((direct_time - pooled_time) / direct_time * 100):.1f}%")
# Typical output: Direct: 45s, Pooled: 8s, Improvement: 82%
Production Best Practices
Sizing your connection pool correctly is critical. Too small, and clients wait for connections. Too large, and you overwhelm the database.
Use this formula as a starting point:
connections = ((core_count * 2) + effective_spindle_count)
For a database server with 8 cores and SSD storage:
connections = (8 * 2) + 1 = 17
Round up to 20-25 for headroom. This applies to default_pool_size in PgBouncer or the total connections across all hostgroups in ProxySQL.
Monitor pool saturation with Prometheus:
# PgBouncer exporter metrics
pgbouncer_pools_server_active_connections
pgbouncer_pools_server_idle_connections
pgbouncer_pools_client_waiting_connections
# Alert when clients are waiting
alert: PgBouncerClientWaiting
expr: pgbouncer_pools_client_waiting_connections > 0
for: 5m
Health check script for PgBouncer:
#!/bin/bash
# Check if PgBouncer is accepting connections
psql -h localhost -p 6432 -U pgbouncer -d pgbouncer -c "SHOW POOLS;" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "PgBouncer healthy"
exit 0
else
echo "PgBouncer unhealthy"
exit 1
fi
Common pitfalls:
-
Prepared statement issues in transaction mode: Prepared statements don’t survive transaction boundaries. Use simple queries or session mode.
-
Connection state leakage: Always close transactions explicitly. A forgotten
BEGINcan hold a connection indefinitely. -
Authentication file sync: Keep PgBouncer’s userlist.txt synchronized with PostgreSQL users, or use
auth_queryfor dynamic lookups.
Conclusion
Choose PgBouncer when you need lightweight, efficient connection pooling for PostgreSQL with minimal operational overhead. It’s the right choice for most PostgreSQL deployments, especially microservices architectures with high connection churn.
Choose ProxySQL when you need advanced MySQL routing, read/write splitting, or query caching. The additional complexity pays off in environments with read replicas or complex query patterns.
Both tools can reduce database connection overhead by 80-95% in high-concurrency scenarios. Start with transaction-level pooling, size your pools conservatively (20-50 connections per database), and monitor client wait times. The performance gains are substantial, and the operational benefits—reduced database memory usage, improved connection limits, graceful degradation under load—make connection pooling essential for production databases.