Path Traversal: Directory Traversal Prevention
Path traversal, also called directory traversal, is a vulnerability that allows attackers to access files outside the intended directory by manipulating file path inputs. When your application takes...
Key Insights
- Path traversal attacks exploit insufficient input validation to escape intended directories, potentially exposing sensitive files like
/etc/passwd, configuration files, or application source code. - The most reliable defense combines canonicalization (resolving the actual path) with base directory validation—never trust input validation alone, as encoding tricks can bypass filters.
- Every language has specific secure file-handling patterns; using
os.path.realpath()in Python,Path.toRealPath()in Java, orpath.resolve()in Node.js with proper base path checks eliminates most vulnerabilities.
Introduction to Path Traversal Attacks
Path traversal, also called directory traversal, is a vulnerability that allows attackers to access files outside the intended directory by manipulating file path inputs. When your application takes a filename from user input and uses it to read or write files, an attacker can inject sequences like ../ to “climb” out of the designated folder.
Consider a simple file download endpoint: /download?file=report.pdf. If the server naively concatenates this with a base path like /var/www/files/, an attacker can request /download?file=../../../etc/passwd to read the system’s password file. The resulting path /var/www/files/../../../etc/passwd resolves to /etc/passwd.
This vulnerability consistently appears in OWASP’s Top 10 and remains prevalent because developers often underestimate how many ways attackers can encode traversal sequences. The impact ranges from information disclosure to complete system compromise when attackers access configuration files containing database credentials or API keys.
How Path Traversal Attacks Work
The attack mechanics are straightforward. Attackers inject relative path components to navigate the filesystem hierarchy. Common payloads include:
- Basic traversal:
../,..\\(Windows) - URL encoding:
..%2f,..%5c - Double encoding:
..%252f - Unicode/overlong UTF-8:
..%c0%af,..%c1%9c - Null byte injection (legacy):
../../../etc/passwd%00.png
Here’s a vulnerable Python endpoint that demonstrates the problem:
from flask import Flask, request, send_file
import os
app = Flask(__name__)
UPLOAD_DIR = "/var/www/uploads"
@app.route("/download")
def download_file():
filename = request.args.get("file")
# VULNERABLE: Direct concatenation with user input
filepath = os.path.join(UPLOAD_DIR, filename)
return send_file(filepath)
The os.path.join() function doesn’t prevent traversal—it simply combines paths. An attacker requesting ?file=../../../etc/shadow gets access to sensitive system files.
The Node.js equivalent is equally dangerous:
const express = require('express');
const path = require('path');
const fs = require('fs');
const app = express();
const UPLOAD_DIR = '/var/www/uploads';
// VULNERABLE: No validation on user input
app.get('/download', (req, res) => {
const filename = req.query.file;
const filepath = path.join(UPLOAD_DIR, filename);
res.sendFile(filepath);
});
Both examples trust user input implicitly. The fix isn’t simply filtering out ../ strings—attackers have dozens of encoding variants to bypass naive filters.
Real-World Impact and Common Vulnerable Patterns
Path traversal vulnerabilities have caused significant breaches. Attackers commonly target:
- Configuration files:
.env,config.yaml,web.configcontaining database credentials - Source code: Application logic, hardcoded secrets, internal API endpoints
- System files:
/etc/passwd, SSH keys, cloud provider metadata - Log files: Often contain session tokens, user data, or internal IP addresses
Several application patterns are particularly susceptible:
File upload handlers that store files with user-controlled names:
@app.route("/upload", methods=["POST"])
def upload_file():
file = request.files["document"]
# VULNERABLE: User controls the filename
filename = file.filename
filepath = os.path.join(UPLOAD_DIR, filename)
file.save(filepath)
return "Uploaded"
An attacker uploading a file named ../../../var/www/html/shell.php could place a webshell in the document root.
Dynamic template loading where template names come from user input:
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader("/app/templates"))
@app.route("/page/<template_name>")
def render_page(template_name):
# VULNERABLE: Template name from URL
template = env.get_template(f"{template_name}.html")
return template.render()
Log viewers and admin panels that display files based on parameters:
app.get('/admin/logs', (req, res) => {
const logFile = req.query.log || 'app.log';
// VULNERABLE: Reading arbitrary files
const content = fs.readFileSync(path.join('/var/log/app', logFile));
res.send(`<pre>${content}</pre>`);
});
Prevention Techniques
Effective path traversal prevention requires multiple layers. Here’s the hierarchy of defenses:
1. Avoid user input in file paths entirely. The safest approach is mapping user input to predefined options:
ALLOWED_FILES = {
"report": "/var/www/files/quarterly_report.pdf",
"manual": "/var/www/files/user_manual.pdf",
"terms": "/var/www/files/terms_of_service.pdf"
}
@app.route("/download")
def download_file():
file_key = request.args.get("file")
if file_key not in ALLOWED_FILES:
return "File not found", 404
return send_file(ALLOWED_FILES[file_key])
2. Canonicalize and validate against base directory. When you must use user input, resolve the actual path and verify it’s within bounds:
import os
UPLOAD_DIR = "/var/www/uploads"
def secure_file_path(base_dir: str, user_input: str) -> str | None:
"""Return safe absolute path or None if traversal detected."""
# Resolve to absolute path, following symlinks
base = os.path.realpath(base_dir)
requested = os.path.realpath(os.path.join(base_dir, user_input))
# Verify the resolved path starts with base directory
if not requested.startswith(base + os.sep):
return None
return requested
@app.route("/download")
def download_file():
filename = request.args.get("file")
filepath = secure_file_path(UPLOAD_DIR, filename)
if filepath is None:
return "Access denied", 403
if not os.path.isfile(filepath):
return "File not found", 404
return send_file(filepath)
The key insight: os.path.realpath() resolves all symbolic links and relative components, giving you the true filesystem location. Then you verify this resolved path is still within your intended directory.
3. Input validation as defense in depth. Reject obviously malicious input, but never rely on this alone:
import re
def validate_filename(filename: str) -> bool:
"""Basic validation - use WITH canonicalization, not instead of."""
if not filename:
return False
# Reject path separators and traversal sequences
if re.search(r'[/\\]|\.\.', filename):
return False
# Allowlist safe characters
if not re.match(r'^[\w\-. ]+$', filename):
return False
return True
Language-Specific Secure File Handling
Each platform has idiomatic approaches to secure file access.
Python:
from pathlib import Path
def safe_read_file(base_dir: str, filename: str) -> bytes | None:
base = Path(base_dir).resolve()
target = (base / filename).resolve()
if not target.is_relative_to(base): # Python 3.9+
return None
return target.read_bytes() if target.is_file() else None
Java:
import java.nio.file.Path;
import java.nio.file.Paths;
import java.io.IOException;
public class SecureFileAccess {
private final Path baseDir;
public SecureFileAccess(String basePath) throws IOException {
this.baseDir = Paths.get(basePath).toRealPath();
}
public Path resolveSafely(String userInput) throws IOException {
Path resolved = baseDir.resolve(userInput).normalize().toRealPath();
if (!resolved.startsWith(baseDir)) {
throw new SecurityException("Path traversal attempt detected");
}
return resolved;
}
}
Node.js:
const path = require('path');
const fs = require('fs').promises;
async function safeReadFile(baseDir, filename) {
const base = await fs.realpath(baseDir);
const target = path.resolve(base, filename);
// Resolve symlinks in target path
let realTarget;
try {
realTarget = await fs.realpath(target);
} catch (e) {
return null; // File doesn't exist
}
if (!realTarget.startsWith(base + path.sep)) {
throw new Error('Path traversal detected');
}
return fs.readFile(realTarget);
}
PHP:
function safeFilePath(string $baseDir, string $userInput): ?string {
$base = realpath($baseDir);
$target = realpath($baseDir . DIRECTORY_SEPARATOR . $userInput);
// realpath returns false if file doesn't exist
if ($target === false) {
return null;
}
if (strpos($target, $base . DIRECTORY_SEPARATOR) !== 0) {
return null;
}
return $target;
}
Testing and Detection
Identifying path traversal vulnerabilities requires systematic testing. Here’s a practical approach:
Manual testing with curl:
# Basic traversal
curl "https://target.com/download?file=../../../etc/passwd"
# URL encoded
curl "https://target.com/download?file=..%2f..%2f..%2fetc%2fpasswd"
# Double encoded
curl "https://target.com/download?file=..%252f..%252f..%252fetc%252fpasswd"
# Windows paths (if applicable)
curl "https://target.com/download?file=..\\..\\..\\windows\\win.ini"
Automated testing script:
import requests
from urllib.parse import quote
BASE_URL = "http://localhost:5000/download"
PAYLOADS = [
"../../../etc/passwd",
"..%2f..%2f..%2fetc%2fpasswd",
"....//....//....//etc/passwd",
"..%252f..%252f..%252fetc%252fpasswd",
"/etc/passwd",
"....\\....\\....\\etc\\passwd",
]
def test_traversal():
for payload in PAYLOADS:
resp = requests.get(BASE_URL, params={"file": payload})
if "root:" in resp.text or resp.status_code == 200:
print(f"[VULNERABLE] Payload: {payload}")
else:
print(f"[BLOCKED] Payload: {payload}")
if __name__ == "__main__":
test_traversal()
For code review, grep for patterns that combine user input with file operations:
# Python
grep -rn "open.*request\." --include="*.py"
grep -rn "send_file.*request\." --include="*.py"
# Node.js
grep -rn "readFile.*req\." --include="*.js"
grep -rn "sendFile.*req\." --include="*.js"
Static analysis tools like Semgrep, Bandit (Python), and ESLint security plugins catch many of these patterns automatically.
Summary and Security Checklist
Path traversal remains dangerous because it’s easy to introduce and often overlooked. Use this checklist when auditing file-handling code:
- User input is never directly used in file paths without validation
- All file paths are canonicalized using
realpath()or equivalent before use - Resolved paths are verified to start with the intended base directory
- File operations use allowlists where possible instead of user-controlled names
- Uploaded filenames are replaced with generated UUIDs or sanitized aggressively
- Error messages don’t reveal filesystem structure or existence of files
- Static analysis tools are configured to flag dangerous file operation patterns
- Integration tests include path traversal payloads for file-handling endpoints
The canonical pattern is simple: resolve the full path, then verify it’s where you expect it to be. Apply this consistently, and path traversal becomes a non-issue in your applications.