Hashing: SHA-256, MD5, and Use Cases
A hash function takes arbitrary input and produces a fixed-size output, called a digest or hash. Three properties define cryptographic hash functions: they're deterministic (same input always yields...
Key Insights
- MD5 is cryptographically broken and should never be used for security purposes, but remains acceptable for non-adversarial checksums and cache keys
- SHA-256 is the current industry standard for cryptographic hashing, offering strong collision resistance with reasonable performance
- Never use raw hashing for passwords—always use purpose-built algorithms like bcrypt, scrypt, or Argon2 with proper salting
What is Hashing?
A hash function takes arbitrary input and produces a fixed-size output, called a digest or hash. Three properties define cryptographic hash functions: they’re deterministic (same input always yields same output), they produce fixed-length output regardless of input size, and they’re one-way (you can’t reverse a hash to get the original input).
This is fundamentally different from encryption. Encryption is reversible with the right key—hashing is not. You hash data when you need to verify it without storing or transmitting the original. You encrypt data when you need to recover it later.
import hashlib
# Hashing is deterministic - same input, same output
message = "Hello, World!"
hash1 = hashlib.sha256(message.encode()).hexdigest()
hash2 = hashlib.sha256(message.encode()).hexdigest()
print(f"Hash 1: {hash1}")
print(f"Hash 2: {hash2}")
print(f"Identical: {hash1 == hash2}")
# Output:
# Hash 1: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
# Hash 2: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
# Identical: True
A single bit change in the input produces a completely different hash—this is called the avalanche effect. It’s why hashing works for integrity verification.
MD5: The Legacy Algorithm
Ronald Rivest designed MD5 in 1991 as a replacement for MD4. It produces a 128-bit (16-byte) hash, typically represented as a 32-character hexadecimal string. For over a decade, MD5 was the go-to algorithm for checksums and basic cryptographic needs.
Then it broke. In 2004, researchers demonstrated practical collision attacks. By 2008, security researchers created a rogue CA certificate using MD5 collisions. The algorithm is now considered cryptographically dead for any security-sensitive application.
But “cryptographically broken” doesn’t mean “useless.” MD5 remains fast and widely supported. For non-adversarial use cases—where nobody is actively trying to exploit collisions—it’s still practical:
import hashlib
import os
def calculate_md5(filepath: str) -> str:
"""Calculate MD5 checksum for file integrity verification."""
hash_md5 = hashlib.md5()
with open(filepath, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
def verify_download(filepath: str, expected_hash: str) -> bool:
"""Verify downloaded file matches expected checksum."""
actual_hash = calculate_md5(filepath)
return actual_hash.lower() == expected_hash.lower()
# Usage for verifying a downloaded file
downloaded_file = "package-1.2.3.tar.gz"
expected = "d41d8cd98f00b204e9800998ecf8427e"
if verify_download(downloaded_file, expected):
print("File integrity verified")
else:
print("WARNING: File may be corrupted")
Acceptable MD5 uses: cache key generation, deduplication checks, non-security checksums, legacy system compatibility. Unacceptable: password hashing, digital signatures, any security context.
SHA-256: The Modern Standard
SHA-256 belongs to the SHA-2 family, designed by the NSA and published in 2001. It produces a 256-bit (32-byte) hash. Despite its NSA origins, the algorithm has been extensively analyzed by the cryptographic community and remains secure.
SHA-256 offers strong collision resistance—no practical attacks exist. The best known attack requires 2^128 operations, which is computationally infeasible with current technology. It’s the algorithm behind Bitcoin’s proof-of-work and is mandated for many compliance standards.
Here’s SHA-256 implementation across common languages:
# Python
import hashlib
def sha256_hash(data: str) -> str:
return hashlib.sha256(data.encode('utf-8')).hexdigest()
print(sha256_hash("Hello, World!"))
# dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
// Node.js
const crypto = require('crypto');
function sha256Hash(data) {
return crypto.createHash('sha256').update(data, 'utf8').digest('hex');
}
console.log(sha256Hash("Hello, World!"));
// dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
// Go
package main
import (
"crypto/sha256"
"encoding/hex"
"fmt"
)
func sha256Hash(data string) string {
hash := sha256.Sum256([]byte(data))
return hex.EncodeToString(hash[:])
}
func main() {
fmt.Println(sha256Hash("Hello, World!"))
// dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
}
SHA-256 is roughly 20-30% slower than MD5, but on modern hardware this rarely matters. If performance is critical and you’re processing gigabytes of data, consider SHA-256 with hardware acceleration (most modern CPUs include SHA-NI instructions) or evaluate BLAKE3.
Common Use Cases
Password Storage
Never store passwords in plaintext. Never use simple hashing either. Attackers with rainbow tables can reverse common password hashes instantly.
Proper password storage requires salting—adding random data to each password before hashing. This defeats rainbow tables because each password has a unique salt.
import hashlib
import os
import secrets
def hash_password(password: str) -> tuple[str, str]:
"""Hash password with random salt. Returns (hash, salt)."""
salt = secrets.token_hex(32) # 256-bit salt
salted = salt + password
password_hash = hashlib.sha256(salted.encode()).hexdigest()
return password_hash, salt
def verify_password(password: str, stored_hash: str, salt: str) -> bool:
"""Verify password against stored hash and salt."""
salted = salt + password
computed_hash = hashlib.sha256(salted.encode()).hexdigest()
return secrets.compare_digest(computed_hash, stored_hash)
# Registration
password_hash, salt = hash_password("user_password_123")
# Store both hash and salt in database
# Login verification
is_valid = verify_password("user_password_123", password_hash, salt)
Important caveat: The example above demonstrates the concept, but for production password storage, use bcrypt, scrypt, or Argon2. These algorithms are intentionally slow and include built-in salting, making brute-force attacks impractical.
Data Integrity Verification
Hashes verify that data hasn’t been modified during transmission or storage:
import hashlib
import json
def create_signed_payload(data: dict, secret_key: str) -> dict:
"""Create payload with integrity hash."""
payload_json = json.dumps(data, sort_keys=True)
signature = hashlib.sha256(
(payload_json + secret_key).encode()
).hexdigest()
return {
"data": data,
"signature": signature
}
def verify_payload(payload: dict, secret_key: str) -> bool:
"""Verify payload integrity."""
data_json = json.dumps(payload["data"], sort_keys=True)
expected_sig = hashlib.sha256(
(data_json + secret_key).encode()
).hexdigest()
return secrets.compare_digest(expected_sig, payload["signature"])
Digital Signatures and Certificates
TLS certificates, code signing, and document signatures all rely on hashing. The document is hashed, then the hash is encrypted with a private key. Recipients decrypt with the public key and compare hashes to verify authenticity and integrity.
Blockchain
Bitcoin and other cryptocurrencies use SHA-256 extensively. Each block contains the hash of the previous block, creating an immutable chain. Proof-of-work mining involves finding inputs that produce hashes meeting specific criteria.
Choosing the Right Algorithm
Use this decision framework:
Use MD5 when:
- Generating cache keys or ETags
- Checking file integrity in trusted environments
- Deduplicating data where security isn’t a concern
- Maintaining compatibility with legacy systems
Use SHA-256 when:
- Any security-sensitive application
- Digital signatures or certificates
- Data integrity in adversarial environments
- Compliance requirements mandate it
- You’re unsure which to choose
Use specialized algorithms when:
- Storing passwords → bcrypt, scrypt, or Argon2
- Need maximum speed → BLAKE3
- Future-proofing → SHA-3
Implementation Best Practices
Always use established cryptographic libraries. Never implement hash algorithms yourself—subtle bugs create exploitable vulnerabilities.
For password hashing, add a “pepper”—a secret value stored separately from the database (often in application configuration). Even if attackers dump your database, they can’t crack passwords without the pepper.
import hashlib
import secrets
import hmac
# Application secret - store in environment variable, not code
PEPPER = os.environ.get("PASSWORD_PEPPER", "")
def secure_hash_password(password: str) -> tuple[str, str]:
"""Production-ready password hashing with salt and pepper."""
salt = secrets.token_hex(32)
# Use HMAC for combining pepper - more secure than concatenation
peppered = hmac.new(
PEPPER.encode(),
password.encode(),
hashlib.sha256
).hexdigest()
# Combine with salt
final_hash = hashlib.sha256(
(salt + peppered).encode()
).hexdigest()
return final_hash, salt
def secure_verify_password(
password: str,
stored_hash: str,
salt: str
) -> bool:
"""Constant-time password verification."""
peppered = hmac.new(
PEPPER.encode(),
password.encode(),
hashlib.sha256
).hexdigest()
computed_hash = hashlib.sha256(
(salt + peppered).encode()
).hexdigest()
# Constant-time comparison prevents timing attacks
return secrets.compare_digest(computed_hash, stored_hash)
The secrets.compare_digest() function performs constant-time comparison, preventing timing attacks where attackers measure response times to guess hash values character by character.
Quick Reference
| Algorithm | Output Size | Speed | Security Status | Use Case |
|---|---|---|---|---|
| MD5 | 128-bit | Fast | Broken | Checksums, caching |
| SHA-256 | 256-bit | Moderate | Secure | General cryptographic use |
| SHA-512 | 512-bit | Moderate | Secure | Higher security margins |
| SHA-3 | Variable | Slower | Secure | Future-proofing |
| BLAKE3 | Variable | Very Fast | Secure | High-performance needs |
| bcrypt | 184-bit | Intentionally Slow | Secure | Password storage |
TL;DR Recommendations:
- Default choice: SHA-256
- Passwords: bcrypt or Argon2 (never raw SHA-256)
- Non-security checksums: MD5 is fine
- Maximum performance: BLAKE3
- Compliance/government: Check specific requirements, often SHA-256 minimum
Hash algorithms are foundational to modern security. Choose appropriately based on your threat model, and always prefer established libraries over custom implementations.