Linux Text Processing with sed: Stream Editor

Key Insights

• sed processes text as a stream, making it memory-efficient for files of any size and perfect for pipeline operations where you transform data on-the-fly without creating intermediate files • Master the substitution command syntax s/pattern/replacement/flags first—it handles 80% of real-world text processing tasks and forms the foundation for understanding sed’s pattern space concept • sed excels at single-purpose text transformations in scripts and pipelines; switch to awk for complex field processing or Python when you need maintainable logic with multiple conditions

Introduction to sed

sed (Stream EDitor) emerged from Bell Labs in the 1970s as a non-interactive version of the ed line editor. Unlike text editors that load entire files into memory and allow interactive manipulation, sed processes text as a continuous stream—reading input line by line, applying transformations, and writing output. This design makes sed exceptionally efficient for processing large files and ideal for automated text manipulation in shell scripts.

The core concept is simple: sed reads a line into its “pattern space” (a buffer), executes commands on that space, outputs the result, and moves to the next line. This stream-oriented approach means sed can process gigabyte-sized log files without consuming significant memory.

Here’s the basic syntax structure:

sed 's/pattern/replacement/' file.txt

This substitutes the first occurrence of “pattern” with “replacement” on each line. Let’s see it in action:

echo "Hello World" | sed 's/World/sed/'
# Output: Hello sed

Basic Text Substitution and Search

The substitution command s/pattern/replacement/flags is sed’s workhorse. Understanding its flags transforms it from a simple find-replace tool into a powerful text processor.

Global substitution with the g flag replaces all occurrences on a line, not just the first:

echo "foo bar foo baz" | sed 's/foo/qux/'
# Output: qux bar foo baz (only first foo replaced)

echo "foo bar foo baz" | sed 's/foo/qux/g'
# Output: qux bar qux baz (all foo replaced)

Case-insensitive matching uses the I flag (note: GNU sed uses I, BSD sed uses i):

echo "Hello HELLO hello" | sed 's/hello/hi/gI'
# Output: hi hi hi

Custom delimiters prevent escaping hell when working with paths or URLs. Any character can delimit the pattern:

# Ugly: escaping forward slashes
sed 's/\/usr\/local\/bin/\/opt\/bin/' file.txt

# Clean: using # as delimiter
sed 's#/usr/local/bin#/opt/bin#' file.txt

# Also valid: using | or :
sed 's|/usr/local/bin|/opt/bin|' file.txt

The p flag prints lines where substitution occurred, typically used with -n to suppress default output:

# Only show lines that changed
sed -n 's/error/ERROR/p' logfile.txt

Address Ranges and Line Selection

sed’s addressing system lets you target specific lines or patterns, making transformations surgical rather than blanket operations.

Line number addressing operates on specific lines:

# Replace only on line 3
sed '3s/foo/bar/' file.txt

# Delete line 5
sed '5d' file.txt

Range operations use comma notation:

# Replace in lines 5-10
sed '5,10s/foo/bar/' file.txt

# Delete lines 1-3
sed '1,3d' file.txt

# From line 5 to end of file
sed '5,$s/foo/bar/' file.txt

Pattern-based addressing selects lines matching a regex:

# Replace only on lines containing "error"
sed '/error/s/WARNING/CRITICAL/' logfile.txt

# Delete all comment lines
sed '/^#/d' config.conf

# Range from first "START" to first "END"
sed '/START/,/END/d' file.txt

Negation with ! inverts the selection:

# Replace on all lines EXCEPT those containing "skip"
sed '/skip/!s/foo/bar/' file.txt

# Delete everything except lines 10-20
sed '10,20!d' file.txt

Advanced sed Commands

Beyond substitution, sed offers commands for deletion, insertion, and complex transformations.

Delete with d removes matching lines entirely:

# Remove blank lines
sed '/^$/d' file.txt

# Remove lines containing "debug"
sed '/debug/d' logfile.txt

Insert (i) and append (a) add text before or after lines:

# Insert header before line 1
sed '1i\# Configuration File' config.txt

# Append footer after last line
sed '$a\# End of File' config.txt

# Add text after lines matching pattern
sed '/\[section\]/a\new_option=value' config.ini

Multiple commands can be chained with -e or separated by semicolons:

# Apply multiple transformations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Or using semicolons
sed 's/foo/bar/g; s/baz/qux/g' file.txt

Hold space is sed’s secondary buffer for complex operations like swapping lines:

# Swap every two lines
sed -n '1~2{h;n;p;g;p}' file.txt

# Reverse file (simple version - not efficient for large files)
sed '1!G;h;$!d' file.txt

Regular Expressions in sed

sed supports extended regular expressions with -E (GNU/BSD) or -r (GNU only), enabling sophisticated pattern matching.

Capture groups and backreferences extract and rearrange data:

# Swap first and last name
echo "John Doe" | sed -E 's/(\w+) (\w+)/\2, \1/'
# Output: Doe, John

# Extract domain from email
echo "user@example.com" | sed -E 's/.*@(.*)/\1/'
# Output: example.com

# Reformat dates from MM/DD/YYYY to YYYY-MM-DD
echo "12/31/2023" | sed -E 's|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3-\1-\2|'
# Output: 2023-12-31

Pattern validation and transformation:

# Mask credit card numbers (show last 4 digits)
echo "1234-5678-9012-3456" | sed -E 's/[0-9]{4}-[0-9]{4}-[0-9]{4}-([0-9]{4})/****-****-****-\1/'
# Output: ****-****-****-3456

# Extract quoted strings
echo 'He said "Hello World" to me' | sed -E 's/.*"(.*)".*/\1/'
# Output: Hello World

Remove consecutive duplicate lines (requires GNU sed):

sed '$!N; /^\(.*\)\n\1$/!P; D' file.txt

Practical Real-World Use Cases

Log file processing - extract errors and anonymize IPs:

# Extract error lines and mask IP addresses
sed -n '/ERROR/p' app.log | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/XXX.XXX.XXX.XXX/g'

# Filter logs by date range and severity
sed -n '/2024-01-15/,/2024-01-16/p' app.log | sed -n '/ERROR\|CRITICAL/p'

Configuration file updates in deployment scripts:

# Update database host in config
sed -i 's/db_host=localhost/db_host=prod-db-01.internal/' app.conf

# Enable feature flags
sed -i '/feature_x_enabled/s/false/true/' settings.ini

# Update multiple environment variables
sed -i -e 's/ENV=dev/ENV=prod/' -e 's/DEBUG=true/DEBUG=false/' .env

CSV/TSV data transformation:

# Convert CSV to TSV
sed 's/,/\t/g' data.csv > data.tsv

# Remove quotes from CSV fields
sed 's/"//g' data.csv

# Extract specific columns (3rd field)
sed -E 's/([^,]*,){2}([^,]*).*/\2/' data.csv

Batch file renaming preparation:

# Generate mv commands for batch renaming
ls *.txt | sed -E 's/(.*).txt$/mv "\1.txt" "\1.md"/' > rename.sh
chmod +x rename.sh

Best Practices and Gotchas

Always backup before in-place editing. The -i flag modifies files directly—add an extension to create backups:

# DANGEROUS: No backup
sed -i 's/foo/bar/' important.txt

# SAFE: Creates important.txt.bak
sed -i.bak 's/foo/bar/' important.txt

# BSD sed requires argument even for no backup
sed -i '' 's/foo/bar/' file.txt  # BSD

Test commands before applying using echo or a sample file:

# Test on sample input
echo "test line" | sed 's/test/production/'

# Preview changes without modifying file
sed 's/foo/bar/g' file.txt | head -20

# Or use diff to see changes
diff file.txt <(sed 's/foo/bar/g' file.txt)

Know when to use alternatives:

Use grep for simple filtering without transformation
Use awk when processing field-based data (columns)
Use perl or python for complex logic or multi-line processing
Use tr for simple character-level transformations

Combine sed in pipelines for powerful data processing:

# Extract, transform, and analyze
grep "ERROR" app.log | \
  sed -E 's/.*\[([0-9-]+)\].*/\1/' | \
  sort | uniq -c | sort -rn

Escape special characters in patterns: . * [ ] ^ $ \ /

# Match literal dot
sed 's/example\.com/example.org/' file.txt

# Match literal brackets
sed 's/\[old\]/[new]/' file.txt

Avoid greediness issues with regex quantifiers:

# Greedy: removes too much
echo "<tag>keep</tag>" | sed 's/<.*>//'
# Output: (empty)

# Non-greedy workaround: match specific characters
echo "<tag>keep</tag>" | sed 's/<[^>]*>//g'
# Output: keep

sed remains indispensable in the Unix toolkit precisely because it does one thing exceptionally well: stream-oriented text transformation. Master its core commands, understand when to reach for it versus alternatives, and you’ll handle text processing tasks with efficiency and precision.