Linux tar and gzip: Archive and Compression
• tar bundles files into a single archive without compression, while gzip compresses data—combining them gives you both space savings and organizational benefits
Key Insights
• tar bundles files into a single archive without compression, while gzip compresses data—combining them gives you both space savings and organizational benefits
• The -z flag seamlessly integrates gzip compression into tar operations, making tar -czf and tar -xzf your go-to commands for most archiving tasks
• Understanding when to compress (backups, transfers) versus when to skip compression (already-compressed media files) prevents wasted CPU cycles and actually saves time
Introduction to tar and gzip
Archiving and compression are distinct operations that serve different purposes. Archiving with tar (tape archive) combines multiple files and directories into a single file, preserving directory structure and file metadata. Compression with gzip reduces file size by encoding data more efficiently. Neither operation requires the other, but they complement each other perfectly.
The reason they’re so frequently used together is practical: tar gives you a single file to work with instead of hundreds or thousands, while gzip shrinks that file for faster transfers and less storage consumption. A directory containing 1000 small files might be 50MB uncompressed, 45MB as a tar archive (slight overhead from metadata), and 15MB as a compressed tar.gz file.
Let’s see this in action:
# Create a sample directory with files
mkdir sample_project
echo "console.log('Hello');" > sample_project/app.js
echo "body { margin: 0; }" > sample_project/style.css
echo "# Project README" > sample_project/README.md
# Check original size
du -sh sample_project/
# Output: 12K sample_project/
# Create tar archive
tar -cf project.tar sample_project/
ls -lh project.tar
# Output: -rw-r--r-- 1 user user 10K Nov 15 10:30 project.tar
# Create compressed tar archive
tar -czf project.tar.gz sample_project/
ls -lh project.tar.gz
# Output: -rw-r--r-- 1 user user 1.2K Nov 15 10:31 project.tar.gz
The compressed version is typically 70-90% smaller for text-heavy content like source code and logs.
Basic tar Operations
The tar command uses single-letter flags that you’ll memorize quickly through repetition. The most essential are:
-c: Create a new archive-x: Extract files from an archive-t: List contents without extracting-v: Verbose output (show files being processed)-f: Specify the archive filename (always required)
The -f flag must be followed immediately by the filename, so it typically comes last in the flag sequence.
# Create an archive
tar -cvf backup.tar /var/www/html/
# -c: create
# -v: verbose (shows each file)
# -f: filename follows
# Extract an archive
tar -xvf backup.tar
# -x: extract
# -v: verbose
# -f: filename follows
# List contents without extracting
tar -tvf backup.tar
# -t: list (table of contents)
# -v: verbose (shows permissions, dates)
# -f: filename follows
The verbose flag -v is optional but recommended while learning. It shows exactly what tar is doing, which helps you verify operations and catch mistakes before they become problems.
One critical detail: when extracting, tar preserves the full path structure from when the archive was created. If you ran tar -cf backup.tar /var/www/html/, extracting will create a var/www/html/ directory structure in your current location. Plan accordingly.
Combining tar with gzip
The -z flag tells tar to pipe its output through gzip for compression (when creating) or to decompress through gzip (when extracting). This integration is so seamless that most users never run gzip separately for archives.
# Create compressed archive
tar -czf website-backup.tar.gz /var/www/html/
# Extract compressed archive
tar -xzf website-backup.tar.gz
# List contents of compressed archive
tar -tzf website-backup.tar.gz
Both .tar.gz and .tgz extensions are valid and equivalent—use whichever your team prefers. I recommend .tar.gz because it’s explicit about the two-stage process, making it clearer for beginners.
When should you use compression? Always, with three exceptions:
- Already-compressed files: Archiving JPEGs, MP4s, or ZIP files won’t shrink them further
- CPU-constrained environments: Compression trades CPU time for disk space
- Temporary archives: If you’re creating and extracting within seconds, skip compression
Here’s a practical backup script you might run daily:
#!/bin/bash
# Daily website backup with compression
BACKUP_DIR="/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SOURCE="/var/www/html"
tar -czf "${BACKUP_DIR}/website_${TIMESTAMP}.tar.gz" "${SOURCE}"
echo "Backup completed: website_${TIMESTAMP}.tar.gz"
Advanced tar Options
Real-world archiving requires more control than basic create and extract operations. Here are the options you’ll need regularly.
Selective Extraction: Extract specific files or directories without unpacking everything:
# Extract only one file
tar -xzf backup.tar.gz var/www/html/config.php
# Extract an entire subdirectory
tar -xzf backup.tar.gz var/www/html/uploads/
# Extract multiple specific files
tar -xzf backup.tar.gz var/www/html/config.php var/www/html/.env
Excluding Files: Keep logs, caches, and temporary files out of your archives:
# Exclude by pattern
tar --exclude='*.log' --exclude='*.tmp' -czf app.tar.gz /opt/application/
# Exclude specific directories
tar --exclude='node_modules' --exclude='.git' -czf project.tar.gz ./my-project/
# Exclude using a file list
echo "*.log" > exclude.txt
echo "cache/" >> exclude.txt
tar --exclude-from=exclude.txt -czf clean-backup.tar.gz /var/www/
Preserving Permissions: The -p flag maintains file ownership and permissions, critical for system backups:
# Preserve permissions (requires root for full effect)
sudo tar -czpf system-backup.tar.gz /etc/
# When extracting as root, permissions are restored
sudo tar -xzpf system-backup.tar.gz
Without -p, extracted files get default permissions based on your umask. For personal archives this is fine, but for system configurations or web applications with specific permission requirements, always use -p.
Incremental Backups: Use --listed-incremental for snapshots that only include changed files:
# Full backup (creates snapshot metadata)
tar -czf full-backup.tar.gz --listed-incremental=backup.snar /data/
# Incremental backup (only changed files since last backup)
tar -czf incremental-backup.tar.gz --listed-incremental=backup.snar /data/
Using gzip Independently
While tar -z handles most compression needs, understanding standalone gzip is valuable for compressing individual files, especially logs.
# Compress a single file (replaces original with .gz version)
gzip access.log
# Creates: access.log.gz (original deleted)
# Decompress
gunzip access.log.gz
# Restores: access.log (compressed version deleted)
# Keep original file while compressing
gzip -k access.log
# Creates: access.log.gz (original kept)
# Decompress and keep compressed version
gunzip -k access.log.gz
Compression levels range from -1 (fastest, least compression) to -9 (slowest, maximum compression). Default is -6:
# Fast compression for quick operations
gzip -1 large-file.txt
# Maximum compression for archival storage
gzip -9 archive-data.txt
# View compression ratio without decompressing
gzip -l file.txt.gz
For most purposes, the default level -6 provides the best balance. Level -9 might save 5-10% more space but takes 2-3x longer. Use it for archival storage where compression happens once but the file is stored for years.
Practical Use Cases and Best Practices
Automated Backup Script: Here’s a production-ready backup solution with rotation:
#!/bin/bash
# backup-rotation.sh - Keep last 7 days of backups
BACKUP_ROOT="/backups/daily"
SOURCE="/var/www/html"
DATE=$(date +%Y%m%d)
BACKUP_FILE="${BACKUP_ROOT}/website-${DATE}.tar.gz"
# Create backup with compression
tar -czf "${BACKUP_FILE}" \
--exclude='*.log' \
--exclude='cache/*' \
"${SOURCE}"
# Delete backups older than 7 days
find "${BACKUP_ROOT}" -name "website-*.tar.gz" -mtime +7 -delete
# Log completion
echo "[$(date)] Backup completed: ${BACKUP_FILE}" >> /var/log/backup.log
Log Rotation: Compress old logs to save space while keeping them accessible:
# Compress logs older than 1 day
find /var/log/application/ -name "*.log" -mtime +1 -exec gzip {} \;
# Or using a more explicit loop
for log in /var/log/application/*.log; do
if [ -f "$log" ] && [ $(find "$log" -mtime +1) ]; then
gzip "$log"
fi
done
Deployment Packages: Create reproducible application releases:
#!/bin/bash
# create-release.sh
VERSION="1.2.3"
APP_NAME="myapp"
tar -czf "${APP_NAME}-${VERSION}.tar.gz" \
--exclude='.git' \
--exclude='node_modules' \
--exclude='*.log' \
--exclude='.env' \
--transform "s,^,${APP_NAME}-${VERSION}/," \
./
# The --transform flag adds a version-specific root directory
Best Practices Summary:
- Use consistent naming: Include dates in ISO format (YYYYMMDD) for sortable filenames
- Always test extraction: After creating critical archives, verify them with
tar -tzf - Document your excludes: Keep an
exclude.txtfile in your repository - Monitor backup sizes: Sudden size changes indicate problems worth investigating
- Compress once: Don’t compress already-compressed formats (media files, PDFs)
- Use absolute paths sparingly: Relative paths make archives more portable
The combination of tar and gzip has survived decades of Unix evolution because it’s simple, reliable, and composable. Master these tools and you’ll handle everything from quick file bundles to enterprise backup systems with the same straightforward commands.