Hashing: SHA-256, MD5, and Use Cases

A hash function takes arbitrary input and produces a fixed-size output, called a digest or hash. Three properties define cryptographic hash functions: they're deterministic (same input always yields...

Key Insights

  • MD5 is cryptographically broken and should never be used for security purposes, but remains acceptable for non-adversarial checksums and cache keys
  • SHA-256 is the current industry standard for cryptographic hashing, offering strong collision resistance with reasonable performance
  • Never use raw hashing for passwords—always use purpose-built algorithms like bcrypt, scrypt, or Argon2 with proper salting

What is Hashing?

A hash function takes arbitrary input and produces a fixed-size output, called a digest or hash. Three properties define cryptographic hash functions: they’re deterministic (same input always yields same output), they produce fixed-length output regardless of input size, and they’re one-way (you can’t reverse a hash to get the original input).

This is fundamentally different from encryption. Encryption is reversible with the right key—hashing is not. You hash data when you need to verify it without storing or transmitting the original. You encrypt data when you need to recover it later.

import hashlib

# Hashing is deterministic - same input, same output
message = "Hello, World!"

hash1 = hashlib.sha256(message.encode()).hexdigest()
hash2 = hashlib.sha256(message.encode()).hexdigest()

print(f"Hash 1: {hash1}")
print(f"Hash 2: {hash2}")
print(f"Identical: {hash1 == hash2}")

# Output:
# Hash 1: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
# Hash 2: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
# Identical: True

A single bit change in the input produces a completely different hash—this is called the avalanche effect. It’s why hashing works for integrity verification.

MD5: The Legacy Algorithm

Ronald Rivest designed MD5 in 1991 as a replacement for MD4. It produces a 128-bit (16-byte) hash, typically represented as a 32-character hexadecimal string. For over a decade, MD5 was the go-to algorithm for checksums and basic cryptographic needs.

Then it broke. In 2004, researchers demonstrated practical collision attacks. By 2008, security researchers created a rogue CA certificate using MD5 collisions. The algorithm is now considered cryptographically dead for any security-sensitive application.

But “cryptographically broken” doesn’t mean “useless.” MD5 remains fast and widely supported. For non-adversarial use cases—where nobody is actively trying to exploit collisions—it’s still practical:

import hashlib
import os

def calculate_md5(filepath: str) -> str:
    """Calculate MD5 checksum for file integrity verification."""
    hash_md5 = hashlib.md5()
    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

def verify_download(filepath: str, expected_hash: str) -> bool:
    """Verify downloaded file matches expected checksum."""
    actual_hash = calculate_md5(filepath)
    return actual_hash.lower() == expected_hash.lower()

# Usage for verifying a downloaded file
downloaded_file = "package-1.2.3.tar.gz"
expected = "d41d8cd98f00b204e9800998ecf8427e"

if verify_download(downloaded_file, expected):
    print("File integrity verified")
else:
    print("WARNING: File may be corrupted")

Acceptable MD5 uses: cache key generation, deduplication checks, non-security checksums, legacy system compatibility. Unacceptable: password hashing, digital signatures, any security context.

SHA-256: The Modern Standard

SHA-256 belongs to the SHA-2 family, designed by the NSA and published in 2001. It produces a 256-bit (32-byte) hash. Despite its NSA origins, the algorithm has been extensively analyzed by the cryptographic community and remains secure.

SHA-256 offers strong collision resistance—no practical attacks exist. The best known attack requires 2^128 operations, which is computationally infeasible with current technology. It’s the algorithm behind Bitcoin’s proof-of-work and is mandated for many compliance standards.

Here’s SHA-256 implementation across common languages:

# Python
import hashlib

def sha256_hash(data: str) -> str:
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

print(sha256_hash("Hello, World!"))
# dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
// Node.js
const crypto = require('crypto');

function sha256Hash(data) {
    return crypto.createHash('sha256').update(data, 'utf8').digest('hex');
}

console.log(sha256Hash("Hello, World!"));
// dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
// Go
package main

import (
    "crypto/sha256"
    "encoding/hex"
    "fmt"
)

func sha256Hash(data string) string {
    hash := sha256.Sum256([]byte(data))
    return hex.EncodeToString(hash[:])
}

func main() {
    fmt.Println(sha256Hash("Hello, World!"))
    // dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
}

SHA-256 is roughly 20-30% slower than MD5, but on modern hardware this rarely matters. If performance is critical and you’re processing gigabytes of data, consider SHA-256 with hardware acceleration (most modern CPUs include SHA-NI instructions) or evaluate BLAKE3.

Common Use Cases

Password Storage

Never store passwords in plaintext. Never use simple hashing either. Attackers with rainbow tables can reverse common password hashes instantly.

Proper password storage requires salting—adding random data to each password before hashing. This defeats rainbow tables because each password has a unique salt.

import hashlib
import os
import secrets

def hash_password(password: str) -> tuple[str, str]:
    """Hash password with random salt. Returns (hash, salt)."""
    salt = secrets.token_hex(32)  # 256-bit salt
    salted = salt + password
    password_hash = hashlib.sha256(salted.encode()).hexdigest()
    return password_hash, salt

def verify_password(password: str, stored_hash: str, salt: str) -> bool:
    """Verify password against stored hash and salt."""
    salted = salt + password
    computed_hash = hashlib.sha256(salted.encode()).hexdigest()
    return secrets.compare_digest(computed_hash, stored_hash)

# Registration
password_hash, salt = hash_password("user_password_123")
# Store both hash and salt in database

# Login verification
is_valid = verify_password("user_password_123", password_hash, salt)

Important caveat: The example above demonstrates the concept, but for production password storage, use bcrypt, scrypt, or Argon2. These algorithms are intentionally slow and include built-in salting, making brute-force attacks impractical.

Data Integrity Verification

Hashes verify that data hasn’t been modified during transmission or storage:

import hashlib
import json

def create_signed_payload(data: dict, secret_key: str) -> dict:
    """Create payload with integrity hash."""
    payload_json = json.dumps(data, sort_keys=True)
    signature = hashlib.sha256(
        (payload_json + secret_key).encode()
    ).hexdigest()
    
    return {
        "data": data,
        "signature": signature
    }

def verify_payload(payload: dict, secret_key: str) -> bool:
    """Verify payload integrity."""
    data_json = json.dumps(payload["data"], sort_keys=True)
    expected_sig = hashlib.sha256(
        (data_json + secret_key).encode()
    ).hexdigest()
    
    return secrets.compare_digest(expected_sig, payload["signature"])

Digital Signatures and Certificates

TLS certificates, code signing, and document signatures all rely on hashing. The document is hashed, then the hash is encrypted with a private key. Recipients decrypt with the public key and compare hashes to verify authenticity and integrity.

Blockchain

Bitcoin and other cryptocurrencies use SHA-256 extensively. Each block contains the hash of the previous block, creating an immutable chain. Proof-of-work mining involves finding inputs that produce hashes meeting specific criteria.

Choosing the Right Algorithm

Use this decision framework:

Use MD5 when:

  • Generating cache keys or ETags
  • Checking file integrity in trusted environments
  • Deduplicating data where security isn’t a concern
  • Maintaining compatibility with legacy systems

Use SHA-256 when:

  • Any security-sensitive application
  • Digital signatures or certificates
  • Data integrity in adversarial environments
  • Compliance requirements mandate it
  • You’re unsure which to choose

Use specialized algorithms when:

  • Storing passwords → bcrypt, scrypt, or Argon2
  • Need maximum speed → BLAKE3
  • Future-proofing → SHA-3

Implementation Best Practices

Always use established cryptographic libraries. Never implement hash algorithms yourself—subtle bugs create exploitable vulnerabilities.

For password hashing, add a “pepper”—a secret value stored separately from the database (often in application configuration). Even if attackers dump your database, they can’t crack passwords without the pepper.

import hashlib
import secrets
import hmac

# Application secret - store in environment variable, not code
PEPPER = os.environ.get("PASSWORD_PEPPER", "")

def secure_hash_password(password: str) -> tuple[str, str]:
    """Production-ready password hashing with salt and pepper."""
    salt = secrets.token_hex(32)
    
    # Use HMAC for combining pepper - more secure than concatenation
    peppered = hmac.new(
        PEPPER.encode(),
        password.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Combine with salt
    final_hash = hashlib.sha256(
        (salt + peppered).encode()
    ).hexdigest()
    
    return final_hash, salt

def secure_verify_password(
    password: str, 
    stored_hash: str, 
    salt: str
) -> bool:
    """Constant-time password verification."""
    peppered = hmac.new(
        PEPPER.encode(),
        password.encode(),
        hashlib.sha256
    ).hexdigest()
    
    computed_hash = hashlib.sha256(
        (salt + peppered).encode()
    ).hexdigest()
    
    # Constant-time comparison prevents timing attacks
    return secrets.compare_digest(computed_hash, stored_hash)

The secrets.compare_digest() function performs constant-time comparison, preventing timing attacks where attackers measure response times to guess hash values character by character.

Quick Reference

Algorithm Output Size Speed Security Status Use Case
MD5 128-bit Fast Broken Checksums, caching
SHA-256 256-bit Moderate Secure General cryptographic use
SHA-512 512-bit Moderate Secure Higher security margins
SHA-3 Variable Slower Secure Future-proofing
BLAKE3 Variable Very Fast Secure High-performance needs
bcrypt 184-bit Intentionally Slow Secure Password storage

TL;DR Recommendations:

  • Default choice: SHA-256
  • Passwords: bcrypt or Argon2 (never raw SHA-256)
  • Non-security checksums: MD5 is fine
  • Maximum performance: BLAKE3
  • Compliance/government: Check specific requirements, often SHA-256 minimum

Hash algorithms are foundational to modern security. Choose appropriately based on your threat model, and always prefer established libraries over custom implementations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.