Path Traversal: Directory Traversal Prevention

Key Insights

Path traversal attacks exploit insufficient input validation to escape intended directories, potentially exposing sensitive files like /etc/passwd, configuration files, or application source code.
The most reliable defense combines canonicalization (resolving the actual path) with base directory validation—never trust input validation alone, as encoding tricks can bypass filters.
Every language has specific secure file-handling patterns; using os.path.realpath() in Python, Path.toRealPath() in Java, or path.resolve() in Node.js with proper base path checks eliminates most vulnerabilities.

Introduction to Path Traversal Attacks

Path traversal, also called directory traversal, is a vulnerability that allows attackers to access files outside the intended directory by manipulating file path inputs. When your application takes a filename from user input and uses it to read or write files, an attacker can inject sequences like ../ to “climb” out of the designated folder.

Consider a simple file download endpoint: /download?file=report.pdf. If the server naively concatenates this with a base path like /var/www/files/, an attacker can request /download?file=../../../etc/passwd to read the system’s password file. The resulting path /var/www/files/../../../etc/passwd resolves to /etc/passwd.

This vulnerability consistently appears in OWASP’s Top 10 and remains prevalent because developers often underestimate how many ways attackers can encode traversal sequences. The impact ranges from information disclosure to complete system compromise when attackers access configuration files containing database credentials or API keys.

How Path Traversal Attacks Work

The attack mechanics are straightforward. Attackers inject relative path components to navigate the filesystem hierarchy. Common payloads include:

Basic traversal: ../, ..\\ (Windows)
URL encoding: ..%2f, ..%5c
Double encoding: ..%252f
Unicode/overlong UTF-8: ..%c0%af, ..%c1%9c
Null byte injection (legacy): ../../../etc/passwd%00.png

Here’s a vulnerable Python endpoint that demonstrates the problem:

from flask import Flask, request, send_file
import os

app = Flask(__name__)
UPLOAD_DIR = "/var/www/uploads"

@app.route("/download")
def download_file():
    filename = request.args.get("file")
    # VULNERABLE: Direct concatenation with user input
    filepath = os.path.join(UPLOAD_DIR, filename)
    return send_file(filepath)

The os.path.join() function doesn’t prevent traversal—it simply combines paths. An attacker requesting ?file=../../../etc/shadow gets access to sensitive system files.

The Node.js equivalent is equally dangerous:

const express = require('express');
const path = require('path');
const fs = require('fs');

const app = express();
const UPLOAD_DIR = '/var/www/uploads';

// VULNERABLE: No validation on user input
app.get('/download', (req, res) => {
    const filename = req.query.file;
    const filepath = path.join(UPLOAD_DIR, filename);
    res.sendFile(filepath);
});

Both examples trust user input implicitly. The fix isn’t simply filtering out ../ strings—attackers have dozens of encoding variants to bypass naive filters.

Real-World Impact and Common Vulnerable Patterns

Path traversal vulnerabilities have caused significant breaches. Attackers commonly target:

Configuration files: .env, config.yaml, web.config containing database credentials
Source code: Application logic, hardcoded secrets, internal API endpoints
System files: /etc/passwd, SSH keys, cloud provider metadata
Log files: Often contain session tokens, user data, or internal IP addresses

Several application patterns are particularly susceptible:

File upload handlers that store files with user-controlled names:

@app.route("/upload", methods=["POST"])
def upload_file():
    file = request.files["document"]
    # VULNERABLE: User controls the filename
    filename = file.filename
    filepath = os.path.join(UPLOAD_DIR, filename)
    file.save(filepath)
    return "Uploaded"

An attacker uploading a file named ../../../var/www/html/shell.php could place a webshell in the document root.

Dynamic template loading where template names come from user input:

from jinja2 import Environment, FileSystemLoader

env = Environment(loader=FileSystemLoader("/app/templates"))

@app.route("/page/<template_name>")
def render_page(template_name):
    # VULNERABLE: Template name from URL
    template = env.get_template(f"{template_name}.html")
    return template.render()

Log viewers and admin panels that display files based on parameters:

app.get('/admin/logs', (req, res) => {
    const logFile = req.query.log || 'app.log';
    // VULNERABLE: Reading arbitrary files
    const content = fs.readFileSync(path.join('/var/log/app', logFile));
    res.send(`<pre>${content}</pre>`);
});

Prevention Techniques

Effective path traversal prevention requires multiple layers. Here’s the hierarchy of defenses:

1. Avoid user input in file paths entirely. The safest approach is mapping user input to predefined options:

ALLOWED_FILES = {
    "report": "/var/www/files/quarterly_report.pdf",
    "manual": "/var/www/files/user_manual.pdf",
    "terms": "/var/www/files/terms_of_service.pdf"
}

@app.route("/download")
def download_file():
    file_key = request.args.get("file")
    if file_key not in ALLOWED_FILES:
        return "File not found", 404
    return send_file(ALLOWED_FILES[file_key])

2. Canonicalize and validate against base directory. When you must use user input, resolve the actual path and verify it’s within bounds:

import os

UPLOAD_DIR = "/var/www/uploads"

def secure_file_path(base_dir: str, user_input: str) -> str | None:
    """Return safe absolute path or None if traversal detected."""
    # Resolve to absolute path, following symlinks
    base = os.path.realpath(base_dir)
    requested = os.path.realpath(os.path.join(base_dir, user_input))
    
    # Verify the resolved path starts with base directory
    if not requested.startswith(base + os.sep):
        return None
    
    return requested

@app.route("/download")
def download_file():
    filename = request.args.get("file")
    filepath = secure_file_path(UPLOAD_DIR, filename)
    
    if filepath is None:
        return "Access denied", 403
    
    if not os.path.isfile(filepath):
        return "File not found", 404
    
    return send_file(filepath)

The key insight: os.path.realpath() resolves all symbolic links and relative components, giving you the true filesystem location. Then you verify this resolved path is still within your intended directory.

3. Input validation as defense in depth. Reject obviously malicious input, but never rely on this alone:

import re

def validate_filename(filename: str) -> bool:
    """Basic validation - use WITH canonicalization, not instead of."""
    if not filename:
        return False
    # Reject path separators and traversal sequences
    if re.search(r'[/\\]|\.\.', filename):
        return False
    # Allowlist safe characters
    if not re.match(r'^[\w\-. ]+$', filename):
        return False
    return True

Language-Specific Secure File Handling

Each platform has idiomatic approaches to secure file access.

Python:

from pathlib import Path

def safe_read_file(base_dir: str, filename: str) -> bytes | None:
    base = Path(base_dir).resolve()
    target = (base / filename).resolve()
    
    if not target.is_relative_to(base):  # Python 3.9+
        return None
    
    return target.read_bytes() if target.is_file() else None

Java:

import java.nio.file.Path;
import java.nio.file.Paths;
import java.io.IOException;

public class SecureFileAccess {
    private final Path baseDir;
    
    public SecureFileAccess(String basePath) throws IOException {
        this.baseDir = Paths.get(basePath).toRealPath();
    }
    
    public Path resolveSafely(String userInput) throws IOException {
        Path resolved = baseDir.resolve(userInput).normalize().toRealPath();
        
        if (!resolved.startsWith(baseDir)) {
            throw new SecurityException("Path traversal attempt detected");
        }
        
        return resolved;
    }
}

Node.js:

const path = require('path');
const fs = require('fs').promises;

async function safeReadFile(baseDir, filename) {
    const base = await fs.realpath(baseDir);
    const target = path.resolve(base, filename);
    
    // Resolve symlinks in target path
    let realTarget;
    try {
        realTarget = await fs.realpath(target);
    } catch (e) {
        return null; // File doesn't exist
    }
    
    if (!realTarget.startsWith(base + path.sep)) {
        throw new Error('Path traversal detected');
    }
    
    return fs.readFile(realTarget);
}

PHP:

function safeFilePath(string $baseDir, string $userInput): ?string {
    $base = realpath($baseDir);
    $target = realpath($baseDir . DIRECTORY_SEPARATOR . $userInput);
    
    // realpath returns false if file doesn't exist
    if ($target === false) {
        return null;
    }
    
    if (strpos($target, $base . DIRECTORY_SEPARATOR) !== 0) {
        return null;
    }
    
    return $target;
}

Testing and Detection

Identifying path traversal vulnerabilities requires systematic testing. Here’s a practical approach:

Manual testing with curl:

# Basic traversal
curl "https://target.com/download?file=../../../etc/passwd"

# URL encoded
curl "https://target.com/download?file=..%2f..%2f..%2fetc%2fpasswd"

# Double encoded
curl "https://target.com/download?file=..%252f..%252f..%252fetc%252fpasswd"

# Windows paths (if applicable)
curl "https://target.com/download?file=..\\..\\..\\windows\\win.ini"

Automated testing script:

import requests
from urllib.parse import quote

BASE_URL = "http://localhost:5000/download"
PAYLOADS = [
    "../../../etc/passwd",
    "..%2f..%2f..%2fetc%2fpasswd",
    "....//....//....//etc/passwd",
    "..%252f..%252f..%252fetc%252fpasswd",
    "/etc/passwd",
    "....\\....\\....\\etc\\passwd",
]

def test_traversal():
    for payload in PAYLOADS:
        resp = requests.get(BASE_URL, params={"file": payload})
        if "root:" in resp.text or resp.status_code == 200:
            print(f"[VULNERABLE] Payload: {payload}")
        else:
            print(f"[BLOCKED] Payload: {payload}")

if __name__ == "__main__":
    test_traversal()

For code review, grep for patterns that combine user input with file operations:

# Python
grep -rn "open.*request\." --include="*.py"
grep -rn "send_file.*request\." --include="*.py"

# Node.js
grep -rn "readFile.*req\." --include="*.js"
grep -rn "sendFile.*req\." --include="*.js"

Static analysis tools like Semgrep, Bandit (Python), and ESLint security plugins catch many of these patterns automatically.

Summary and Security Checklist

Path traversal remains dangerous because it’s easy to introduce and often overlooked. Use this checklist when auditing file-handling code:

User input is never directly used in file paths without validation
All file paths are canonicalized using realpath() or equivalent before use
Resolved paths are verified to start with the intended base directory
File operations use allowlists where possible instead of user-controlled names
Uploaded filenames are replaced with generated UUIDs or sanitized aggressively
Error messages don’t reveal filesystem structure or existence of files
Static analysis tools are configured to flag dangerous file operation patterns
Integration tests include path traversal payloads for file-handling endpoints

The canonical pattern is simple: resolve the full path, then verify it’s where you expect it to be. Apply this consistently, and path traversal becomes a non-issue in your applications.