Grafana: Dashboard and Visualization

Key Insights

Grafana transforms raw metrics into actionable intelligence through flexible visualizations and supports over 80 data source integrations including Prometheus, InfluxDB, and PostgreSQL
Dashboard-as-code approaches using JSON definitions and provisioning enable version control, automated deployments, and consistent environments across teams
Query optimization and proper use of dashboard variables can reduce data source load by 70% while making dashboards more maintainable and reusable

Introduction to Grafana

Grafana has become the de facto standard for metrics visualization in modern observability stacks. As an open-source analytics platform, it excels at transforming time-series data into meaningful dashboards that help teams understand system behavior, diagnose issues, and make data-driven decisions.

Unlike monolithic monitoring solutions, Grafana operates as a visualization layer that sits atop your existing data sources. This architectural choice makes it incredibly flexible—you can visualize Prometheus metrics alongside application logs from Elasticsearch and business metrics from PostgreSQL, all within a single pane of glass.

The platform’s strength lies in three core capabilities: real-time metric visualization, flexible alerting, and extensive data source support. Whether you’re monitoring Kubernetes clusters, tracking application performance, or analyzing business KPIs, Grafana provides the tools to build dashboards that matter.

Setting Up Grafana

The fastest path to a working Grafana instance is Docker. Here’s a production-ready setup with Prometheus as a data source:

version: '3.8'

services:
  grafana:
    image: grafana/grafana:10.2.0
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://localhost:3000
      - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
    volumes:
      - grafana-storage:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:v2.47.0
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-storage:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
    networks:
      - monitoring

volumes:
  grafana-storage:
  prometheus-storage:

networks:
  monitoring:
    driver: bridge

For production deployments, customize grafana.ini to enforce security policies and configure authentication:

[server]
protocol = https
cert_file = /etc/grafana/ssl/cert.pem
cert_key = /etc/grafana/ssl/key.pem
root_url = https://grafana.yourdomain.com

[security]
admin_user = admin
admin_password = ${GRAFANA_ADMIN_PASSWORD}
disable_gravatar = true
cookie_secure = true
strict_transport_security = true

[auth.anonymous]
enabled = false

[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer

[log]
mode = console file
level = info

Add data sources programmatically using the API to ensure consistency across environments:

curl -X POST http://admin:admin@localhost:3000/api/datasources \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Prometheus",
    "type": "prometheus",
    "url": "http://prometheus:9090",
    "access": "proxy",
    "isDefault": true,
    "jsonData": {
      "httpMethod": "POST",
      "timeInterval": "30s"
    }
  }'

Data Sources and Queries

Grafana’s query editor adapts to each data source’s native query language. Understanding how to write efficient queries is critical for dashboard performance.

For Prometheus, use PromQL to aggregate metrics efficiently:

# CPU usage by container
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)

# Memory usage percentage
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

# Request rate with 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

InfluxDB queries using Flux provide powerful data transformation capabilities:

from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_percent")
  |> aggregateWindow(every: 1m, fn: mean)
  |> group(columns: ["host"])
  |> yield(name: "mean_cpu")

For relational data sources like PostgreSQL, leverage SQL for business metrics:

SELECT 
  time_bucket('5 minutes', created_at) AS time,
  count(*) AS order_count,
  sum(total_amount) AS revenue,
  avg(total_amount) AS avg_order_value
FROM orders
WHERE created_at > NOW() - INTERVAL '24 hours'
  AND status = 'completed'
GROUP BY time_bucket('5 minutes', created_at)
ORDER BY time DESC

Building Dashboards

Dashboards are JSON documents that define panels, layouts, and variables. While the UI is convenient for initial creation, managing dashboards as code enables version control and automated deployment.

Here’s a complete panel definition showing key configuration options:

{
  "id": 2,
  "title": "CPU Usage",
  "type": "timeseries",
  "datasource": {
    "type": "prometheus",
    "uid": "prometheus-uid"
  },
  "targets": [
    {
      "expr": "rate(process_cpu_seconds_total[5m]) * 100",
      "legendFormat": "{{instance}}",
      "refId": "A"
    }
  ],
  "fieldConfig": {
    "defaults": {
      "unit": "percent",
      "min": 0,
      "max": 100,
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "value": 0, "color": "green" },
          { "value": 70, "color": "yellow" },
          { "value": 90, "color": "red" }
        ]
      }
    }
  },
  "options": {
    "tooltip": { "mode": "multi" },
    "legend": { "displayMode": "table", "placement": "right" }
  }
}

Dashboard variables make dashboards reusable across environments. Configure query-based variables to dynamically populate dropdowns:

{
  "templating": {
    "list": [
      {
        "name": "namespace",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_pod_info, namespace)",
        "refresh": 1,
        "multi": true,
        "includeAll": true
      },
      {
        "name": "pod",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
        "refresh": 2,
        "multi": false
      }
    ]
  }
}

Panel transformations enable data manipulation before visualization:

{
  "transformations": [
    {
      "id": "merge",
      "options": {}
    },
    {
      "id": "organize",
      "options": {
        "excludeByName": {
          "Time": false
        },
        "renameByName": {
          "Value": "Request Count"
        }
      }
    },
    {
      "id": "calculateField",
      "options": {
        "mode": "reduceRow",
        "reduce": {
          "reducer": "sum"
        },
        "alias": "Total Requests"
      }
    }
  ]
}

Advanced Features

Alerting in Grafana has evolved significantly. The unified alerting system supports multi-dimensional rules with flexible routing:

apiVersion: 1
groups:
  - name: system_alerts
    interval: 1m
    rules:
      - uid: high_cpu_alert
        title: High CPU Usage
        condition: A
        data:
          - refId: A
            datasourceUid: prometheus-uid
            model:
              expr: avg(rate(cpu_usage_total[5m])) > 0.8
              intervalMs: 60000
        noDataState: NoData
        execErrState: Error
        for: 5m
        annotations:
          description: "CPU usage is above 80% for 5 minutes"
        labels:
          severity: warning
          team: platform

Provisioning enables infrastructure-as-code for Grafana configurations. Place this in /etc/grafana/provisioning/dashboards/:

apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: 'Infrastructure'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true

Best Practices and Performance

Optimize queries to reduce data source load. Use recording rules in Prometheus for frequently accessed metrics:

# prometheus-rules.yml
groups:
  - name: aggregated_metrics
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)
      
      - record: instance:cpu_usage:avg
        expr: avg(rate(cpu_usage_seconds_total[5m])) by (instance)

Structure dashboards logically using folders and tags. Create a backup script to preserve configurations:

#!/bin/bash

GRAFANA_URL="http://localhost:3000"
API_KEY="your-api-key"
BACKUP_DIR="./grafana-backup-$(date +%Y%m%d)"

mkdir -p "$BACKUP_DIR/dashboards"
mkdir -p "$BACKUP_DIR/datasources"

# Backup all dashboards
curl -H "Authorization: Bearer $API_KEY" \
  "$GRAFANA_URL/api/search?type=dash-db" | \
  jq -r '.[] | .uid' | while read uid; do
    curl -H "Authorization: Bearer $API_KEY" \
      "$GRAFANA_URL/api/dashboards/uid/$uid" > \
      "$BACKUP_DIR/dashboards/$uid.json"
done

# Backup data sources
curl -H "Authorization: Bearer $API_KEY" \
  "$GRAFANA_URL/api/datasources" > \
  "$BACKUP_DIR/datasources/datasources.json"

echo "Backup completed: $BACKUP_DIR"

Limit the time range and resolution for large datasets. Use query caching where supported, and avoid using * wildcards in label matchers. For teams managing multiple Grafana instances, implement GitOps workflows where dashboard JSON files are version controlled and automatically deployed through CI/CD pipelines.

Grafana’s true power emerges when treated as code rather than a point-and-click tool. Embrace provisioning, automate deployments, and design dashboards that scale with your infrastructure.