Grafana: Dashboard and Visualization
Grafana has become the de facto standard for metrics visualization in modern observability stacks. As an open-source analytics platform, it excels at transforming time-series data into meaningful...
Key Insights
- Grafana transforms raw metrics into actionable intelligence through flexible visualizations and supports over 80 data source integrations including Prometheus, InfluxDB, and PostgreSQL
- Dashboard-as-code approaches using JSON definitions and provisioning enable version control, automated deployments, and consistent environments across teams
- Query optimization and proper use of dashboard variables can reduce data source load by 70% while making dashboards more maintainable and reusable
Introduction to Grafana
Grafana has become the de facto standard for metrics visualization in modern observability stacks. As an open-source analytics platform, it excels at transforming time-series data into meaningful dashboards that help teams understand system behavior, diagnose issues, and make data-driven decisions.
Unlike monolithic monitoring solutions, Grafana operates as a visualization layer that sits atop your existing data sources. This architectural choice makes it incredibly flexible—you can visualize Prometheus metrics alongside application logs from Elasticsearch and business metrics from PostgreSQL, all within a single pane of glass.
The platform’s strength lies in three core capabilities: real-time metric visualization, flexible alerting, and extensive data source support. Whether you’re monitoring Kubernetes clusters, tracking application performance, or analyzing business KPIs, Grafana provides the tools to build dashboards that matter.
Setting Up Grafana
The fastest path to a working Grafana instance is Docker. Here’s a production-ready setup with Prometheus as a data source:
version: '3.8'
services:
grafana:
image: grafana/grafana:10.2.0
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://localhost:3000
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
volumes:
- grafana-storage:/var/lib/grafana
- ./provisioning:/etc/grafana/provisioning
networks:
- monitoring
prometheus:
image: prom/prometheus:v2.47.0
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-storage:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
networks:
- monitoring
volumes:
grafana-storage:
prometheus-storage:
networks:
monitoring:
driver: bridge
For production deployments, customize grafana.ini to enforce security policies and configure authentication:
[server]
protocol = https
cert_file = /etc/grafana/ssl/cert.pem
cert_key = /etc/grafana/ssl/key.pem
root_url = https://grafana.yourdomain.com
[security]
admin_user = admin
admin_password = ${GRAFANA_ADMIN_PASSWORD}
disable_gravatar = true
cookie_secure = true
strict_transport_security = true
[auth.anonymous]
enabled = false
[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer
[log]
mode = console file
level = info
Add data sources programmatically using the API to ensure consistency across environments:
curl -X POST http://admin:admin@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://prometheus:9090",
"access": "proxy",
"isDefault": true,
"jsonData": {
"httpMethod": "POST",
"timeInterval": "30s"
}
}'
Data Sources and Queries
Grafana’s query editor adapts to each data source’s native query language. Understanding how to write efficient queries is critical for dashboard performance.
For Prometheus, use PromQL to aggregate metrics efficiently:
# CPU usage by container
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)
# Memory usage percentage
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
# Request rate with 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
InfluxDB queries using Flux provide powerful data transformation capabilities:
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_percent")
|> aggregateWindow(every: 1m, fn: mean)
|> group(columns: ["host"])
|> yield(name: "mean_cpu")
For relational data sources like PostgreSQL, leverage SQL for business metrics:
SELECT
time_bucket('5 minutes', created_at) AS time,
count(*) AS order_count,
sum(total_amount) AS revenue,
avg(total_amount) AS avg_order_value
FROM orders
WHERE created_at > NOW() - INTERVAL '24 hours'
AND status = 'completed'
GROUP BY time_bucket('5 minutes', created_at)
ORDER BY time DESC
Building Dashboards
Dashboards are JSON documents that define panels, layouts, and variables. While the UI is convenient for initial creation, managing dashboards as code enables version control and automated deployment.
Here’s a complete panel definition showing key configuration options:
{
"id": 2,
"title": "CPU Usage",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus-uid"
},
"targets": [
{
"expr": "rate(process_cpu_seconds_total[5m]) * 100",
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": 0, "color": "green" },
{ "value": 70, "color": "yellow" },
{ "value": 90, "color": "red" }
]
}
}
},
"options": {
"tooltip": { "mode": "multi" },
"legend": { "displayMode": "table", "placement": "right" }
}
}
Dashboard variables make dashboards reusable across environments. Configure query-based variables to dynamically populate dropdowns:
{
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": true,
"includeAll": true
},
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"multi": false
}
]
}
}
Panel transformations enable data manipulation before visualization:
{
"transformations": [
{
"id": "merge",
"options": {}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": false
},
"renameByName": {
"Value": "Request Count"
}
}
},
{
"id": "calculateField",
"options": {
"mode": "reduceRow",
"reduce": {
"reducer": "sum"
},
"alias": "Total Requests"
}
}
]
}
Advanced Features
Alerting in Grafana has evolved significantly. The unified alerting system supports multi-dimensional rules with flexible routing:
apiVersion: 1
groups:
- name: system_alerts
interval: 1m
rules:
- uid: high_cpu_alert
title: High CPU Usage
condition: A
data:
- refId: A
datasourceUid: prometheus-uid
model:
expr: avg(rate(cpu_usage_total[5m])) > 0.8
intervalMs: 60000
noDataState: NoData
execErrState: Error
for: 5m
annotations:
description: "CPU usage is above 80% for 5 minutes"
labels:
severity: warning
team: platform
Provisioning enables infrastructure-as-code for Grafana configurations. Place this in /etc/grafana/provisioning/dashboards/:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: 'Infrastructure'
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
Best Practices and Performance
Optimize queries to reduce data source load. Use recording rules in Prometheus for frequently accessed metrics:
# prometheus-rules.yml
groups:
- name: aggregated_metrics
interval: 30s
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: instance:cpu_usage:avg
expr: avg(rate(cpu_usage_seconds_total[5m])) by (instance)
Structure dashboards logically using folders and tags. Create a backup script to preserve configurations:
#!/bin/bash
GRAFANA_URL="http://localhost:3000"
API_KEY="your-api-key"
BACKUP_DIR="./grafana-backup-$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR/dashboards"
mkdir -p "$BACKUP_DIR/datasources"
# Backup all dashboards
curl -H "Authorization: Bearer $API_KEY" \
"$GRAFANA_URL/api/search?type=dash-db" | \
jq -r '.[] | .uid' | while read uid; do
curl -H "Authorization: Bearer $API_KEY" \
"$GRAFANA_URL/api/dashboards/uid/$uid" > \
"$BACKUP_DIR/dashboards/$uid.json"
done
# Backup data sources
curl -H "Authorization: Bearer $API_KEY" \
"$GRAFANA_URL/api/datasources" > \
"$BACKUP_DIR/datasources/datasources.json"
echo "Backup completed: $BACKUP_DIR"
Limit the time range and resolution for large datasets. Use query caching where supported, and avoid using * wildcards in label matchers. For teams managing multiple Grafana instances, implement GitOps workflows where dashboard JSON files are version controlled and automatically deployed through CI/CD pipelines.
Grafana’s true power emerges when treated as code rather than a point-and-click tool. Embrace provisioning, automate deployments, and design dashboards that scale with your infrastructure.