Bash String Manipulation: Substring, Replace, Length
Bash provides robust built-in string manipulation capabilities that many developers overlook in favor of external tools. While `sed`, `awk`, and `grep` are powerful, spawning external processes for...
Key Insights
- Bash’s built-in string manipulation is significantly faster than spawning external processes like
sedorawkfor simple operations, making it ideal for performance-critical scripts - Understanding the difference between
#(prefix removal) and%(suffix removal) operators, along with their greedy counterparts##and%%, unlocks powerful pattern-based text processing - Combining substring extraction with parameter expansion enables you to parse structured data without external dependencies, reducing script complexity and improving portability
Why Bash String Manipulation Matters
Bash provides robust built-in string manipulation capabilities that many developers overlook in favor of external tools. While sed, awk, and grep are powerful, spawning external processes for simple string operations introduces unnecessary overhead. For scripts that process thousands of strings in loops, the performance difference becomes substantial.
Built-in string operations also improve script portability. A script relying solely on Bash features will run on any system with Bash installed, without worrying about different versions of GNU tools, BSD variants, or missing utilities.
String Length Operations
The simplest string operation is determining length using ${#variable} syntax:
#!/bin/bash
text="Hello, World!"
length=${#text}
echo "Length: $length" # Output: Length: 13
This becomes particularly useful for input validation:
#!/bin/bash
validate_password() {
local password="$1"
local min_length=8
local max_length=64
if [ ${#password} -lt $min_length ]; then
echo "Error: Password must be at least $min_length characters"
return 1
elif [ ${#password} -gt $max_length ]; then
echo "Error: Password exceeds maximum length of $max_length"
return 1
fi
echo "Password length valid"
return 0
}
validate_password "short" # Error: too short
validate_password "validpassword123" # Valid
Substring Extraction
Bash uses ${variable:offset:length} for substring extraction. The offset is zero-indexed:
#!/bin/bash
text="Application Architect"
# Extract from position 12, take 9 characters
echo "${text:12:9}" # Output: Architect
# Extract from position 0, take 11 characters
echo "${text:0:11}" # Output: Application
# Omit length to extract until end
echo "${text:12}" # Output: Architect
Bash 4.2+ supports negative offsets to extract from the end of strings. You must include a space or parentheses to avoid confusion with default value syntax:
#!/bin/bash
filename="report_2024_03_15.csv"
# Extract last 4 characters (file extension)
echo "${filename: -4}" # Output: .csv
# Extract last 10 characters, take 10
echo "${filename: -14:10}" # Output: 2024_03_15
Here’s a practical example parsing date components from a log filename:
#!/bin/bash
parse_log_date() {
local logfile="$1"
# Format: app_YYYY_MM_DD.log
local year="${logfile:4:4}"
local month="${logfile:9:2}"
local day="${logfile:12:2}"
echo "Year: $year, Month: $month, Day: $day"
}
parse_log_date "app_2024_03_15.log"
# Output: Year: 2024, Month: 03, Day: 15
Pattern-Based Substring Removal
Bash provides four operators for pattern-based removal:
${variable#pattern}- Remove shortest match from beginning${variable##pattern}- Remove longest match from beginning${variable%pattern}- Remove shortest match from end${variable%%pattern}- Remove longest match from end
Think of # as the beginning (like a comment at the start of a line) and % as the end.
#!/bin/bash
filepath="/home/user/documents/report.tar.gz"
# Remove shortest match from beginning (remove up to first /)
echo "${filepath#*/}" # Output: home/user/documents/report.tar.gz
# Remove longest match from beginning (remove up to last /)
echo "${filepath##*/}" # Output: report.tar.gz
# Remove shortest match from end (remove .gz)
echo "${filepath%.*}" # Output: /home/user/documents/report.tar
# Remove longest match from end (remove .tar.gz)
echo "${filepath%%.*}" # Output: /home/user/documents/report
Here’s a practical function for extracting filename components:
#!/bin/bash
get_filename_parts() {
local fullpath="$1"
local filename="${fullpath##*/}" # report.tar.gz
local basename="${filename%%.*}" # report
local extension="${filename#*.}" # tar.gz
local first_ext="${filename##*.}" # gz
echo "Full path: $fullpath"
echo "Filename: $filename"
echo "Basename: $basename"
echo "Full extension: $extension"
echo "Last extension: $first_ext"
}
get_filename_parts "/var/log/app/backup.tar.gz"
String Replacement
Bash supports pattern replacement with two variants:
${variable/pattern/replacement}- Replace first occurrence${variable//pattern/replacement}- Replace all occurrences
#!/bin/bash
text="foo bar foo baz"
# Replace first occurrence
echo "${text/foo/qux}" # Output: qux bar foo baz
# Replace all occurrences
echo "${text//foo/qux}" # Output: qux bar qux baz
# Remove characters (empty replacement)
echo "${text//foo/}" # Output: bar baz
Replacement patterns support basic wildcards:
#!/bin/bash
# Remove all whitespace
data=" spaced out text "
echo "${data// /}" # Output: spacedouttext
# Replace multiple spaces with single space
text="too many spaces"
while [[ "$text" =~ " " ]]; do
text="${text// / }"
done
echo "$text" # Output: too many spaces
Here’s a practical example sanitizing user input:
#!/bin/bash
sanitize_input() {
local input="$1"
# Remove potentially dangerous characters
input="${input//[;|&]/}"
# Replace spaces with underscores
input="${input// /_}"
# Convert to lowercase
input="${input,,}"
# Remove multiple underscores
while [[ "$input" =~ "__" ]]; do
input="${input//__/_}"
done
echo "$input"
}
sanitize_input "My File; Name & Stuff"
# Output: my_file_name_stuff
Practical Applications
Let’s combine these techniques in a real-world CSV parser:
#!/bin/bash
parse_csv_line() {
local line="$1"
local IFS=','
local -a fields=($line)
for i in "${!fields[@]}"; do
# Trim leading/trailing whitespace
local field="${fields[$i]}"
field="${field#"${field%%[![:space:]]*}"}" # Trim leading
field="${field%"${field##*[![:space:]]}"}" # Trim trailing
# Remove quotes if present
if [[ "${field:0:1}" == '"' && "${field: -1}" == '"' ]]; then
field="${field:1:${#field}-2}"
fi
fields[$i]="$field"
done
printf '%s\n' "${fields[@]}"
}
# Test
parse_csv_line '"John Doe", 42 , "New York" '
# Output:
# John Doe
# 42
# New York
Here’s a configuration file parser:
#!/bin/bash
declare -A config
parse_config() {
local file="$1"
while IFS= read -r line; do
# Skip comments and empty lines
[[ "$line" =~ ^[[:space:]]*# ]] && continue
[[ -z "${line// /}" ]] && continue
# Split on = sign
if [[ "$line" =~ ^([^=]+)=(.+)$ ]]; then
local key="${BASH_REMATCH[1]}"
local value="${BASH_REMATCH[2]}"
# Trim whitespace
key="${key#"${key%%[![:space:]]*}"}"
key="${key%"${key##*[![:space:]]}"}"
value="${value#"${value%%[![:space:]]*}"}"
value="${value%"${value##*[![:space:]]}"}"
# Remove quotes from value
value="${value//\"/}"
config["$key"]="$value"
fi
done < "$file"
}
# Usage
echo 'database_host = "localhost"' > test.conf
echo 'port = 5432' >> test.conf
echo '# This is a comment' >> test.conf
echo 'username = "admin"' >> test.conf
parse_config "test.conf"
echo "Host: ${config[database_host]}"
echo "Port: ${config[port]}"
echo "User: ${config[username]}"
Performance Considerations
Built-in string operations are substantially faster than external tools for simple operations:
#!/bin/bash
# Slow: spawns external process
filename=$(basename "$filepath")
# Fast: built-in operation
filename="${filepath##*/}"
# For loops processing thousands of items, built-in operations
# can be 10-100x faster than external tools
However, for complex pattern matching or transformations, sed and awk remain more appropriate. Use built-in operations when:
- Processing simple patterns
- Working in tight loops
- Portability is critical
- The operation is straightforward (length, simple extraction, basic replacement)
Use external tools when:
- Complex regular expressions are needed
- Multi-line processing is required
- Advanced text transformations are necessary
- Readability trumps performance
Conclusion
Bash string manipulation provides powerful, efficient tools for common text processing tasks. Master these core patterns: ${#var} for length, ${var:offset:length} for extraction, # and % for pattern removal, and / for replacement. These built-in operations eliminate dependencies on external tools, improve script performance, and enhance portability. For production scripts that process significant amounts of text data, preferring built-in string operations over external processes can yield measurable performance improvements while keeping your code clean and maintainable.