Linux yq: Command-Line YAML Processing

If you've worked with JSON on the command line, you've likely used jq. For YAML files, yq fills the same role—a lightweight, powerful processor for querying and manipulating structured data without...

Key Insights

  • yq brings jq-style processing to YAML files, making it essential for managing Kubernetes manifests, CI/CD configs, and infrastructure-as-code
  • The tool supports in-place editing, multi-document processing, and complex transformations without writing custom scripts
  • Modern yq (v4+) uses a completely different syntax than v3—always check your version and use yq eval for v4 commands

Introduction to yq

If you’ve worked with JSON on the command line, you’ve likely used jq. For YAML files, yq fills the same role—a lightweight, powerful processor for querying and manipulating structured data without opening an editor or writing Python scripts.

YAML dominates modern infrastructure tooling. Kubernetes manifests, Docker Compose files, Ansible playbooks, GitHub Actions workflows, and Helm charts all use YAML. Being able to programmatically read and modify these files is crucial for automation, CI/CD pipelines, and configuration management at scale.

Installation varies by distribution, but here’s what you need:

# Ubuntu/Debian
sudo wget -qO /usr/local/bin/yq https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
sudo chmod +x /usr/local/bin/yq

# Using snap
sudo snap install yq

# macOS
brew install yq

# Verify installation and check version
yq --version

Critical note: There are multiple tools named yq. This article covers mikefarah/yq (v4+), the most actively maintained and feature-rich option. The Python-based yq and older v3 syntax are incompatible with examples here.

Reading and Querying YAML Files

Let’s start with a sample Kubernetes deployment file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web
spec:
  replicas: 3
  containers:
    - name: nginx
      image: nginx:1.21
      ports:
        - containerPort: 80
    - name: sidecar
      image: busybox:latest

Extract specific values using dot notation:

# Get deployment name
yq eval '.metadata.name' deployment.yaml
# Output: web-app

# Get namespace
yq eval '.metadata.namespace' deployment.yaml
# Output: production

# Access nested values
yq eval '.spec.replicas' deployment.yaml
# Output: 3

Array indexing works with square brackets:

# Get first container name
yq eval '.spec.containers[0].name' deployment.yaml
# Output: nginx

# Get second container image
yq eval '.spec.containers[1].image' deployment.yaml
# Output: busybox:latest

# Get all container names (returns array)
yq eval '.spec.containers[].name' deployment.yaml
# Output:
# nginx
# sidecar

Filter arrays with select conditions:

# Find containers using nginx image
yq eval '.spec.containers[] | select(.image == "nginx:1.21")' deployment.yaml

# Get names of containers exposing port 80
yq eval '.spec.containers[] | select(.ports[].containerPort == 80) | .name' deployment.yaml

The pipe operator | chains operations, similar to Unix pipes. This becomes powerful for complex queries.

Modifying YAML Content

Reading data is useful, but yq shines when modifying configurations programmatically.

Update existing values:

# Change replica count
yq eval '.spec.replicas = 5' deployment.yaml

# Update container image
yq eval '.spec.containers[0].image = "nginx:1.22"' deployment.yaml

# Update with in-place editing (-i flag)
yq eval -i '.spec.replicas = 5' deployment.yaml

The -i flag modifies files directly. Without it, yq prints to stdout, letting you preview changes or redirect to new files.

Add new fields:

# Add a new label
yq eval '.metadata.labels.environment = "prod"' deployment.yaml

# Add annotations object if it doesn't exist
yq eval '.metadata.annotations.deployed-by = "automation"' deployment.yaml

# Add resource limits to first container
yq eval '.spec.containers[0].resources.limits.memory = "512Mi"' deployment.yaml

Delete keys with the del() function:

# Remove a specific label
yq eval 'del(.metadata.labels.app)' deployment.yaml

# Remove entire annotations section
yq eval 'del(.metadata.annotations)' deployment.yaml

# Remove second container from array
yq eval 'del(.spec.containers[1])' deployment.yaml

Working with Multiple YAML Documents

Many YAML files contain multiple documents separated by ---. Kubernetes manifests often bundle related resources this way.

---
apiVersion: v1
kind: Service
metadata:
  name: web-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app

Process all documents:

# Get all resource kinds
yq eval '.kind' multi-resource.yaml
# Output:
# Service
# Deployment

# Select specific document (0-indexed)
yq eval 'select(document_index == 0)' multi-resource.yaml

Filter documents by content:

# Get only Deployment resources
yq eval 'select(.kind == "Deployment")' multi-resource.yaml

# Extract names of all Services
yq eval 'select(.kind == "Service") | .metadata.name' multi-resource.yaml

Merge multiple YAML files:

# Combine files (maintains document separators)
yq eval-all '.' file1.yaml file2.yaml > combined.yaml

# Merge into single document
yq eval-all 'select(fileIndex == 0) * select(fileIndex == 1)' base.yaml override.yaml

Advanced Operations

Merge objects using the * operator:

# Merge labels from two sources
yq eval '.metadata.labels = .metadata.labels * {"new-label": "value"}' deployment.yaml

Conditional updates with select:

# Update only containers named "nginx"
yq eval '(.spec.containers[] | select(.name == "nginx") | .image) = "nginx:1.23"' deployment.yaml

# Add resource limits only to containers without them
yq eval '.spec.containers[] |= (select(.resources == null) | .resources.limits.cpu = "100m")' deployment.yaml

Map operations over arrays:

# Add a prefix to all container names
yq eval '.spec.containers[].name |= "prod-" + .' deployment.yaml

# Convert all image tags to use digest format (example transformation)
yq eval '.spec.containers[].image |= . + "@sha256:abc123"' deployment.yaml

String interpolation and variables:

# Use environment variables
export NEW_TAG="v2.0.1"
yq eval '.spec.containers[0].image = "myapp:" + env(NEW_TAG)' deployment.yaml

# Use internal variables
yq eval '.version as $v | .metadata.labels.version = $v' config.yaml

Practical Real-World Scenarios

Update Kubernetes deployment image tags (common in CI/CD):

#!/bin/bash
DEPLOYMENT="deployment.yaml"
NEW_IMAGE="myregistry.io/app:${CI_COMMIT_SHA}"

yq eval -i "(.spec.template.spec.containers[] | select(.name == \"app\") | .image) = \"${NEW_IMAGE}\"" $DEPLOYMENT

Extract all container images from a namespace export:

# Useful for security audits
yq eval '.items[].spec.template.spec.containers[].image' namespace-export.yaml | sort -u

Modify Helm values files:

# Update multiple values in values.yaml
yq eval -i '
  .image.tag = "v2.1.0" |
  .replicaCount = 5 |
  .ingress.enabled = true
' values.yaml

Validate and format YAML files:

# Pretty-print with consistent indentation
yq eval '.' messy.yaml > formatted.yaml

# Validate YAML syntax (exits with error if invalid)
yq eval '.' config.yaml > /dev/null && echo "Valid YAML"

# Convert YAML to JSON
yq eval -o=json '.' config.yaml > config.json

Bulk update across multiple files:

# Update image tag in all deployments
find k8s/ -name "*.yaml" -exec yq eval -i '
  (select(.kind == "Deployment") | .spec.template.spec.containers[0].image) |= 
  sub(":[^:]+$", ":v2.0.0")
' {} \;

Tips and Best Practices

Chain operations efficiently:

# Multiple updates in one command
yq eval -i '
  .metadata.labels.version = "2.0" |
  .spec.replicas = 3 |
  del(.metadata.annotations.old-annotation)
' deployment.yaml

Use // for default values:

# Set replicas to 3 if not specified
yq eval '.spec.replicas //= 3' deployment.yaml

Output formatting options:

# JSON output
yq eval -o=json '.' file.yaml

# Compact output (no colors, minimal formatting)
yq eval -o=yaml -P '.' file.yaml

# Custom indentation
yq eval --indent 4 '.' file.yaml

Test expressions before in-place edits:

# Always preview first
yq eval '.spec.replicas = 10' deployment.yaml

# Then apply if correct
yq eval -i '.spec.replicas = 10' deployment.yaml

Handle errors gracefully in scripts:

#!/bin/bash
set -e  # Exit on error

if ! yq eval '.metadata.name' deployment.yaml > /dev/null 2>&1; then
    echo "Error: Invalid YAML or missing .metadata.name"
    exit 1
fi

Use comments for complex expressions:

# While yq doesn't support inline comments in expressions,
# document your scripts externally or use heredocs:

yq eval '
  # Update image tag
  .spec.containers[0].image = "nginx:1.23" |
  # Increase replicas
  .spec.replicas = 5
' deployment.yaml

Master yq and you’ll handle YAML manipulation faster than opening an editor. It’s indispensable for GitOps workflows, automated deployments, and infrastructure automation. The learning curve is worth it—you’ll use these patterns daily in modern DevOps environments.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.