Bash String Manipulation: Substring, Replace, Length

Key Insights

Bash’s built-in string manipulation is significantly faster than spawning external processes like sed or awk for simple operations, making it ideal for performance-critical scripts
Understanding the difference between # (prefix removal) and % (suffix removal) operators, along with their greedy counterparts ## and %%, unlocks powerful pattern-based text processing
Combining substring extraction with parameter expansion enables you to parse structured data without external dependencies, reducing script complexity and improving portability

Why Bash String Manipulation Matters

Bash provides robust built-in string manipulation capabilities that many developers overlook in favor of external tools. While sed, awk, and grep are powerful, spawning external processes for simple string operations introduces unnecessary overhead. For scripts that process thousands of strings in loops, the performance difference becomes substantial.

Built-in string operations also improve script portability. A script relying solely on Bash features will run on any system with Bash installed, without worrying about different versions of GNU tools, BSD variants, or missing utilities.

String Length Operations

The simplest string operation is determining length using ${#variable} syntax:

#!/bin/bash

text="Hello, World!"
length=${#text}
echo "Length: $length"  # Output: Length: 13

This becomes particularly useful for input validation:

#!/bin/bash

validate_password() {
    local password="$1"
    local min_length=8
    local max_length=64
    
    if [ ${#password} -lt $min_length ]; then
        echo "Error: Password must be at least $min_length characters"
        return 1
    elif [ ${#password} -gt $max_length ]; then
        echo "Error: Password exceeds maximum length of $max_length"
        return 1
    fi
    
    echo "Password length valid"
    return 0
}

validate_password "short"      # Error: too short
validate_password "validpassword123"  # Valid

Substring Extraction

Bash uses ${variable:offset:length} for substring extraction. The offset is zero-indexed:

#!/bin/bash

text="Application Architect"

# Extract from position 12, take 9 characters
echo "${text:12:9}"  # Output: Architect

# Extract from position 0, take 11 characters
echo "${text:0:11}"  # Output: Application

# Omit length to extract until end
echo "${text:12}"    # Output: Architect

Bash 4.2+ supports negative offsets to extract from the end of strings. You must include a space or parentheses to avoid confusion with default value syntax:

#!/bin/bash

filename="report_2024_03_15.csv"

# Extract last 4 characters (file extension)
echo "${filename: -4}"   # Output: .csv

# Extract last 10 characters, take 10
echo "${filename: -14:10}"  # Output: 2024_03_15

Here’s a practical example parsing date components from a log filename:

#!/bin/bash

parse_log_date() {
    local logfile="$1"
    # Format: app_YYYY_MM_DD.log
    
    local year="${logfile:4:4}"
    local month="${logfile:9:2}"
    local day="${logfile:12:2}"
    
    echo "Year: $year, Month: $month, Day: $day"
}

parse_log_date "app_2024_03_15.log"
# Output: Year: 2024, Month: 03, Day: 15

Pattern-Based Substring Removal

Bash provides four operators for pattern-based removal:

${variable#pattern} - Remove shortest match from beginning
${variable##pattern} - Remove longest match from beginning
${variable%pattern} - Remove shortest match from end
${variable%%pattern} - Remove longest match from end

Think of # as the beginning (like a comment at the start of a line) and % as the end.

#!/bin/bash

filepath="/home/user/documents/report.tar.gz"

# Remove shortest match from beginning (remove up to first /)
echo "${filepath#*/}"     # Output: home/user/documents/report.tar.gz

# Remove longest match from beginning (remove up to last /)
echo "${filepath##*/}"    # Output: report.tar.gz

# Remove shortest match from end (remove .gz)
echo "${filepath%.*}"     # Output: /home/user/documents/report.tar

# Remove longest match from end (remove .tar.gz)
echo "${filepath%%.*}"    # Output: /home/user/documents/report

Here’s a practical function for extracting filename components:

#!/bin/bash

get_filename_parts() {
    local fullpath="$1"
    
    local filename="${fullpath##*/}"        # report.tar.gz
    local basename="${filename%%.*}"        # report
    local extension="${filename#*.}"        # tar.gz
    local first_ext="${filename##*.}"       # gz
    
    echo "Full path: $fullpath"
    echo "Filename: $filename"
    echo "Basename: $basename"
    echo "Full extension: $extension"
    echo "Last extension: $first_ext"
}

get_filename_parts "/var/log/app/backup.tar.gz"

String Replacement

Bash supports pattern replacement with two variants:

${variable/pattern/replacement} - Replace first occurrence
${variable//pattern/replacement} - Replace all occurrences

#!/bin/bash

text="foo bar foo baz"

# Replace first occurrence
echo "${text/foo/qux}"   # Output: qux bar foo baz

# Replace all occurrences
echo "${text//foo/qux}"  # Output: qux bar qux baz

# Remove characters (empty replacement)
echo "${text//foo/}"     # Output:  bar  baz

Replacement patterns support basic wildcards:

#!/bin/bash

# Remove all whitespace
data="  spaced   out  text  "
echo "${data// /}"  # Output: spacedouttext

# Replace multiple spaces with single space
text="too    many     spaces"
while [[ "$text" =~ "  " ]]; do
    text="${text//  / }"
done
echo "$text"  # Output: too many spaces

Here’s a practical example sanitizing user input:

#!/bin/bash

sanitize_input() {
    local input="$1"
    
    # Remove potentially dangerous characters
    input="${input//[;|&]/}"
    
    # Replace spaces with underscores
    input="${input// /_}"
    
    # Convert to lowercase
    input="${input,,}"
    
    # Remove multiple underscores
    while [[ "$input" =~ "__" ]]; do
        input="${input//__/_}"
    done
    
    echo "$input"
}

sanitize_input "My File; Name & Stuff"
# Output: my_file_name_stuff

Practical Applications

Let’s combine these techniques in a real-world CSV parser:

#!/bin/bash

parse_csv_line() {
    local line="$1"
    local IFS=','
    local -a fields=($line)
    
    for i in "${!fields[@]}"; do
        # Trim leading/trailing whitespace
        local field="${fields[$i]}"
        field="${field#"${field%%[![:space:]]*}"}"  # Trim leading
        field="${field%"${field##*[![:space:]]}"}"  # Trim trailing
        
        # Remove quotes if present
        if [[ "${field:0:1}" == '"' && "${field: -1}" == '"' ]]; then
            field="${field:1:${#field}-2}"
        fi
        
        fields[$i]="$field"
    done
    
    printf '%s\n' "${fields[@]}"
}

# Test
parse_csv_line '"John Doe", 42 , "New York" '
# Output:
# John Doe
# 42
# New York

Here’s a configuration file parser:

#!/bin/bash

declare -A config

parse_config() {
    local file="$1"
    
    while IFS= read -r line; do
        # Skip comments and empty lines
        [[ "$line" =~ ^[[:space:]]*# ]] && continue
        [[ -z "${line// /}" ]] && continue
        
        # Split on = sign
        if [[ "$line" =~ ^([^=]+)=(.+)$ ]]; then
            local key="${BASH_REMATCH[1]}"
            local value="${BASH_REMATCH[2]}"
            
            # Trim whitespace
            key="${key#"${key%%[![:space:]]*}"}"
            key="${key%"${key##*[![:space:]]}"}"
            value="${value#"${value%%[![:space:]]*}"}"
            value="${value%"${value##*[![:space:]]}"}"
            
            # Remove quotes from value
            value="${value//\"/}"
            
            config["$key"]="$value"
        fi
    done < "$file"
}

# Usage
echo 'database_host = "localhost"' > test.conf
echo 'port = 5432' >> test.conf
echo '# This is a comment' >> test.conf
echo 'username = "admin"' >> test.conf

parse_config "test.conf"

echo "Host: ${config[database_host]}"
echo "Port: ${config[port]}"
echo "User: ${config[username]}"

Performance Considerations

Built-in string operations are substantially faster than external tools for simple operations:

#!/bin/bash

# Slow: spawns external process
filename=$(basename "$filepath")

# Fast: built-in operation
filename="${filepath##*/}"

# For loops processing thousands of items, built-in operations
# can be 10-100x faster than external tools

However, for complex pattern matching or transformations, sed and awk remain more appropriate. Use built-in operations when:

Processing simple patterns
Working in tight loops
Portability is critical
The operation is straightforward (length, simple extraction, basic replacement)

Use external tools when:

Complex regular expressions are needed
Multi-line processing is required
Advanced text transformations are necessary
Readability trumps performance

Conclusion

Bash string manipulation provides powerful, efficient tools for common text processing tasks. Master these core patterns: ${#var} for length, ${var:offset:length} for extraction, # and % for pattern removal, and / for replacement. These built-in operations eliminate dependencies on external tools, improve script performance, and enhance portability. For production scripts that process significant amounts of text data, preferring built-in string operations over external processes can yield measurable performance improvements while keeping your code clean and maintainable.