Prometheus: Metrics Collection and Alerting

Key Insights

Prometheus uses a pull-based model where the server scrapes metrics from instrumented targets, making it simpler to operate than push-based systems and providing built-in service discovery
The four metric types (Counter, Gauge, Histogram, Summary) serve distinct purposes—use Counters for cumulative values, Gauges for snapshots, and Histograms for latency distributions
Alert fatigue is real—design alerting rules around symptoms users experience rather than component failures, and use recording rules to pre-aggregate expensive queries for dashboard performance

Introduction to Prometheus Architecture

Prometheus is an open-source monitoring system built specifically for dynamic cloud environments. Unlike traditional monitoring tools that rely on agents pushing metrics to a central server, Prometheus pulls metrics from HTTP endpoints exposed by your applications and infrastructure components.

The architecture consists of four main components. The Prometheus server scrapes and stores time-series data, executing queries and evaluating alerting rules. Exporters expose metrics from third-party systems like databases, message queues, and hardware. The Pushgateway handles metrics from short-lived jobs that don’t exist long enough to be scraped. Alertmanager receives alerts from Prometheus and handles deduplication, grouping, and routing to notification channels.

The time-series database stores metrics as streams of timestamped values identified by metric names and key-value labels. This dimensional data model enables powerful querying and aggregation across multiple label dimensions without requiring schema changes.

Setting Up Prometheus

For production deployments, run Prometheus in Docker or Kubernetes. The official Docker image requires minimal configuration:

docker run -p 9090:9090 \
  -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

For Kubernetes, use the Prometheus Operator which provides custom resources for managing Prometheus deployments, service monitors, and alert rules.

The prometheus.yml configuration file defines global settings, scrape targets, and alerting rules:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production-us-east-1'
    environment: 'prod'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Scrape configurations
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'api-servers'
    static_configs:
      - targets:
        - 'api-1.example.com:8080'
        - 'api-2.example.com:8080'
        - 'api-3.example.com:8080'
    scrape_interval: 10s
    metrics_path: '/metrics'

  - job_name: 'postgres-exporter'
    static_configs:
      - targets: ['postgres-exporter:9187']

# Storage retention
storage:
  tsdb:
    retention.time: 30d
    retention.size: 50GB

Set retention policies based on your storage capacity and query patterns. Most teams keep 15-30 days of raw metrics and use recording rules or remote storage for long-term data.

Instrumenting Applications for Metrics

Prometheus provides client libraries for Go, Java, Python, Ruby, and other languages. Expose metrics on an HTTP endpoint (typically /metrics) that Prometheus scrapes.

Here’s a Go application instrumented with the Prometheus client:

package main

import (
    "net/http"
    "time"
    
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )
    
    httpRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_request_duration_seconds",
            Help: "HTTP request latency distributions",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
    
    activeConnections = promauto.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )
)

func instrumentHandler(endpoint string, handler http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        activeConnections.Inc()
        defer activeConnections.Dec()
        
        handler(w, r)
        
        duration := time.Since(start).Seconds()
        httpRequestDuration.WithLabelValues(r.Method, endpoint).Observe(duration)
        httpRequestsTotal.WithLabelValues(r.Method, endpoint, "200").Inc()
    }
}

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.HandleFunc("/api/users", instrumentHandler("/api/users", handleUsers))
    http.ListenAndServe(":8080", nil)
}

func handleUsers(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Users endpoint"))
}

For Python applications using Flask:

from flask import Flask
from prometheus_client import Counter, Histogram, Gauge, generate_latest
import time

app = Flask(__name__)

# Define metrics
order_total = Counter(
    'orders_total', 
    'Total number of orders',
    ['product_type', 'status']
)

order_value = Histogram(
    'order_value_dollars',
    'Order value in dollars',
    buckets=[10, 50, 100, 500, 1000, 5000]
)

inventory_items = Gauge(
    'inventory_items',
    'Current inventory count',
    ['product_id']
)

@app.route('/api/order', methods=['POST'])
def create_order():
    # Business logic
    product_type = 'electronics'
    value = 299.99
    
    # Update metrics
    order_total.labels(product_type=product_type, status='completed').inc()
    order_value.observe(value)
    inventory_items.labels(product_id='12345').dec()
    
    return {'status': 'success'}

@app.route('/metrics')
def metrics():
    return generate_latest()

Use Counters for cumulative values that only increase (requests, errors, sales). Use Gauges for values that go up and down (memory usage, queue depth, temperature). Use Histograms for distributions (request latency, response size).

Writing PromQL Queries

PromQL (Prometheus Query Language) retrieves and transforms time-series data. Master these common patterns:

# CPU usage percentage across all instances
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Request rate per second
rate(http_requests_total[5m])

# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) 
/ 
sum(rate(http_requests_total[5m])) * 100

# 95th percentile latency
histogram_quantile(0.95, 
  rate(http_request_duration_seconds_bucket[5m])
)

# Memory usage by pod
sum by(pod) (container_memory_working_set_bytes{namespace="production"})

# Top 5 endpoints by request count
topk(5, sum by(endpoint) (rate(http_requests_total[1h])))

# Aggregate across multiple labels
sum without(instance, pod) (up{job="api-servers"})

The rate() function calculates per-second rate over a time window—essential for counters. Always use range vectors (e.g., [5m]) with rate(). The histogram_quantile() function computes percentiles from histogram buckets.

Configuring Alerting Rules

Define alerting rules in separate YAML files referenced from prometheus.yml:

# alerts.yml
groups:
  - name: api_alerts
    interval: 30s
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) 
          / 
          sum(rate(http_requests_total[5m])) > 0.05          
        for: 5m
        labels:
          severity: critical
          component: api
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.95,
            rate(http_request_duration_seconds_bucket[5m])
          ) > 1.0          
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High latency on {{ $labels.endpoint }}"

      - alert: PodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"

      - alert: DiskSpaceRunningOut
        expr: |
          (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.10          
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Disk space below 10% on {{ $labels.instance }}"

Configure Alertmanager to route alerts to appropriate channels:

# alertmanager.yml
global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  receiver: 'default'
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 4h
  
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      continue: true
    
    - match:
        severity: warning
      receiver: 'slack-warnings'

receivers:
  - name: 'default'
    slack_configs:
      - channel: '#alerts'
        title: 'Alert: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: 'YOUR_PAGERDUTY_KEY'

  - name: 'slack-warnings'
    slack_configs:
      - channel: '#monitoring'

The for clause prevents flapping alerts by requiring conditions to persist before firing. Group related alerts to reduce notification noise.

Service Discovery and Scalability

Kubernetes service discovery automatically discovers pods and services:

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    
    relabel_configs:
      # Only scrape pods with prometheus.io/scrape annotation
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      
      # Use custom metrics path if specified
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      
      # Use custom port if specified
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      
      # Add namespace and pod name as labels
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Annotate your pods to enable scraping:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

Best Practices and Production Considerations

Avoid high cardinality labels like user IDs or request IDs—they explode the number of time series. Keep label cardinality under 10 values per label when possible.

Use recording rules to pre-compute expensive queries:

# recording_rules.yml
groups:
  - name: performance_rules
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: sum by(job) (rate(http_requests_total[5m]))
      
      - record: job:http_request_duration_seconds:p95
        expr: |
          histogram_quantile(0.95,
            sum by(job, le) (rate(http_request_duration_seconds_bucket[5m]))
          )          
      
      - record: instance:node_cpu:utilization
        expr: |
          100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Recording rules reduce dashboard load times and enable faster alerting on complex queries.

Monitor Prometheus itself by scraping its own /metrics endpoint. Watch prometheus_tsdb_head_series for cardinality issues and prometheus_rule_evaluation_failures_total for broken rules.

Set up remote storage for long-term retention using Thanos, Cortex, or cloud-managed solutions. The local TSDB works well for short-term data but doesn’t scale across multiple Prometheus instances.

Prometheus excels at monitoring dynamic infrastructure with its pull-based model and powerful query language. Instrument your applications early, keep cardinality low, and design alerts around user impact rather than component metrics.