Fluentd: Log Collection and Forwarding
In distributed systems, logs scatter across dozens or hundreds of services, containers, and hosts. Without centralized collection, debugging production issues becomes archaeological work—SSH-ing into...
Key Insights
- Fluentd creates a unified logging layer that decouples log sources from destinations, eliminating point-to-point integration complexity in distributed systems
- The tag-based routing system and plugin architecture enable powerful log transformation pipelines without writing custom code
- Production deployments require careful buffer tuning and aggregator/forwarder patterns to handle high throughput while maintaining reliability
Introduction to Fluentd and the Unified Logging Layer
In distributed systems, logs scatter across dozens or hundreds of services, containers, and hosts. Without centralized collection, debugging production issues becomes archaeological work—SSH-ing into servers, grepping files, correlating timestamps manually. Fluentd solves this by implementing a unified logging layer that sits between your applications and log storage systems.
The unified logging layer concept means applications don’t care where logs go. They write to stdout, files, or send to a local Fluentd agent. Fluentd handles collection, parsing, enrichment, and routing to multiple destinations simultaneously. Change your log storage from Elasticsearch to S3? Update Fluentd config, not application code.
Fluentd’s real power comes from its plugin ecosystem. Over 500 plugins cover virtually every input source and output destination. This flexibility makes it the de facto standard for Kubernetes logging and a core component of the EFK (Elasticsearch, Fluentd, Kibana) stack.
Core Architecture and Data Pipeline
Fluentd processes logs through a five-stage pipeline: Input → Parser → Filter → Buffer → Output. Each stage transforms data incrementally, keeping the architecture clean and debuggable.
Events flow through the system tagged with labels like app.nginx or system.syslog. Tags determine routing—different tags can go to different outputs, or the same tag can fan out to multiple destinations. This tag-based routing eliminates complex conditional logic.
Here’s a basic configuration showing the pipeline structure:
# Input: Tail application logs
<source>
@type tail
path /var/log/app/*.log
pos_file /var/log/fluentd/app.pos
tag app.backend
<parse>
@type json
</parse>
</source>
# Filter: Add metadata
<filter app.**>
@type record_transformer
<record>
hostname ${hostname}
environment production
</record>
</filter>
# Output: Send to Elasticsearch
<match app.**>
@type elasticsearch
host elasticsearch.local
port 9200
index_name fluentd
type_name log
</match>
This configuration tails JSON logs, adds hostname and environment tags, then sends everything to Elasticsearch. The ** wildcard in filters and matches handles all tags starting with app.
Input Plugins and Log Collection
Fluentd supports diverse input sources. The tail plugin monitors log files like tail -f, maintaining position files to survive restarts. The forward plugin receives logs from other Fluentd instances, creating aggregation hierarchies. HTTP inputs let applications POST logs directly.
Tailing application log files:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/fluentd/nginx.pos
tag nginx.access
<parse>
@type nginx
</parse>
</source>
Receiving logs from Docker containers:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
Configure Docker to use the Fluentd logging driver:
docker run --log-driver=fluentd \
--log-opt fluentd-address=localhost:24224 \
--log-opt tag=docker.{{.Name}} \
myapp
HTTP endpoint for application logging:
<source>
@type http
port 8888
bind 0.0.0.0
body_size_limit 32m
keepalive_timeout 10s
<parse>
@type json
</parse>
</source>
Applications can now POST JSON logs:
curl -X POST -d 'json={"event":"user_login","user_id":12345}' \
http://localhost:8888/app.events
Filters and Data Transformation
Filters transform events in-flight. The parser filter extracts structured data from unstructured logs. The record_transformer adds, removes, or modifies fields. The grep filter includes or excludes events based on patterns.
Parsing complex log formats:
<filter app.backend>
@type parser
key_name message
<parse>
@type regexp
expression /^\[(?<timestamp>[^\]]+)\] (?<level>\w+): (?<message>.*)$/
time_key timestamp
time_format %Y-%m-%d %H:%M:%S
</parse>
</filter>
Adding metadata and computed fields:
<filter app.**>
@type record_transformer
enable_ruby true
<record>
hostname "#{Socket.gethostname}"
environment "#{ENV['APP_ENV']}"
timestamp ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
log_size ${record.to_json.bytesize}
</record>
</filter>
Filtering sensitive information:
<filter app.**>
@type grep
<exclude>
key message
pattern /(password|credit_card|ssn)=/i
</exclude>
</filter>
<filter app.**>
@type record_modifier
remove_keys password, api_key, secret
</filter>
Output Plugins and Destinations
Output plugins send processed logs to storage systems. Buffer configuration at the output level controls reliability versus performance tradeoffs.
Elasticsearch with daily indices:
<match app.**>
@type elasticsearch
host elasticsearch.local
port 9200
index_name app-logs-%Y.%m.%d
type_name _doc
<buffer tag, time>
@type file
path /var/log/fluentd/buffer/elasticsearch
timekey 3600
timekey_wait 10m
flush_mode interval
flush_interval 30s
retry_max_interval 30
retry_forever true
</buffer>
</match>
S3 with time-sliced partitioning:
<match app.**>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket my-app-logs
s3_region us-east-1
path logs/%Y/%m/%d/
<buffer tag, time>
@type file
path /var/log/fluentd/buffer/s3
timekey 3600
timekey_wait 10m
chunk_limit_size 256m
</buffer>
<format>
@type json
</format>
</match>
Multiple outputs for redundancy:
<match app.**>
@type copy
<store>
@type elasticsearch
host es-primary.local
port 9200
</store>
<store>
@type s3
s3_bucket backup-logs
s3_region us-east-1
</store>
<store>
@type kafka2
brokers kafka1:9092,kafka2:9092
topic_key topic
default_topic app-logs
</store>
</match>
Production Deployment Patterns
Production Fluentd deployments typically use an aggregator/forwarder architecture. Lightweight forwarders run on each host, collecting logs locally with minimal resource usage. They forward to aggregator instances that handle heavy processing, filtering, and output to storage.
This pattern provides several benefits: forwarders restart quickly without losing data, aggregators can scale independently, and you centralize expensive operations like parsing and enrichment.
Docker Compose aggregator setup:
version: '3'
services:
fluentd-aggregator:
image: fluent/fluentd:v1.16-1
ports:
- "24224:24224"
- "24224:24224/udp"
volumes:
- ./fluentd/aggregator.conf:/fluentd/etc/fluent.conf
- ./fluentd/buffer:/var/log/fluentd/buffer
environment:
- FLUENTD_CONF=fluent.conf
deploy:
replicas: 2
resources:
limits:
memory: 2G
reservations:
memory: 1G
fluentd-forwarder:
image: fluent/fluentd:v1.16-1
volumes:
- ./fluentd/forwarder.conf:/fluentd/etc/fluent.conf
- /var/log:/var/log:ro
environment:
- FLUENTD_CONF=fluent.conf
- AGGREGATOR_HOST=fluentd-aggregator
depends_on:
- fluentd-aggregator
Forwarder configuration:
<source>
@type tail
path /var/log/app/*.log
pos_file /var/log/fluentd/app.pos
tag app.logs
<parse>
@type json
</parse>
</source>
<match **>
@type forward
<server>
host fluentd-aggregator
port 24224
</server>
<buffer>
@type file
path /var/log/fluentd/buffer/forward
flush_interval 5s
retry_max_interval 30
</buffer>
</match>
Performance Tuning and Best Practices
Fluentd performance depends primarily on buffer configuration and worker threads. Buffers absorb traffic spikes and enable batch processing. File-based buffers survive restarts but add I/O overhead. Memory buffers are faster but risk data loss.
High-throughput configuration:
<system>
workers 4
root_dir /var/log/fluentd
</system>
<source>
@type forward
port 24224
</source>
<match app.**>
@type elasticsearch
host elasticsearch.local
port 9200
<buffer tag>
@type file
path /var/log/fluentd/buffer/es
# Buffer sizing
chunk_limit_size 8MB
queue_limit_length 512
# Flush behavior
flush_mode interval
flush_interval 10s
flush_thread_count 4
# Retry behavior
retry_type exponential_backoff
retry_timeout 72h
retry_max_interval 30
retry_forever false
# Overflow behavior
overflow_action drop_oldest_chunk
</buffer>
</match>
Key tuning parameters:
workers: Multiply throughput by running parallel pipelines. Set to number of CPU cores.chunk_limit_size: Larger chunks reduce overhead but increase memory usage. 8-16MB works well.queue_limit_length: How many chunks to buffer. Calculate based on expected traffic and acceptable data loss window.flush_thread_count: Parallel output threads. Match to downstream system’s concurrency limits.
Common pitfalls:
- Forgetting
pos_filein tail inputs causes re-processing logs after restart - Insufficient buffer space causes data loss under load
- Not monitoring Fluentd’s own metrics (buffer queue length, retry count)
- Complex regex parsing in high-volume pipelines kills performance—parse at aggregator, not forwarder
Monitor Fluentd using its built-in metrics:
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
Access metrics at http://localhost:24220/api/plugins.json to track buffer usage, retry counts, and throughput.
Fluentd transforms log chaos into structured, routable data streams. Master its pipeline architecture and buffer tuning, and you’ll build logging infrastructure that scales from dozens to thousands of hosts.