Fast Exponentiation: Modular Power Algorithm

Key Insights

Binary exponentiation reduces time complexity from O(n) to O(log n) by exploiting the binary representation of the exponent, making previously impossible calculations trivial.
Modular arithmetic isn’t just about getting remainders—it’s the critical technique that prevents integer overflow when computing powers for cryptographic applications.
The iterative implementation should be your default choice in production code; it’s stack-safe, slightly faster, and handles arbitrarily large exponents without recursion depth issues.

Why Naive Exponentiation Fails

Computing 3^13 by multiplying 3 thirteen times works fine. Computing 2^1000000007 the same way? Your program will run until the heat death of the universe.

The naive approach has two fatal flaws. First, time complexity is O(n) where n is the exponent. When n is a billion, you’re doing a billion multiplications. Second, the result overflows almost immediately. Even with 64-bit integers, 2^64 already exceeds the maximum value. Without modular arithmetic, you’d need arbitrary-precision libraries that slow everything down further.

This matters in real systems. RSA encryption operates on exponents with hundreds of digits. Competitive programming problems routinely ask for answers “modulo 10^9 + 7” specifically because the actual answers would have millions of digits. Diffie-Hellman key exchange, digital signatures, and primality testing all depend on computing modular powers efficiently.

You need an algorithm that handles exponents in the billions while keeping intermediate results bounded. That’s exactly what modular exponentiation provides.

The Core Insight: Binary Exponentiation

The key observation is deceptively simple: squaring a number doubles its exponent.

If you know x^8, you can get x^16 with a single multiplication (x^8 × x^8). This means you don’t need 16 multiplications—you need 4 squarings: x → x^2 → x^4 → x^8 → x^16.

More formally:

If n is even: x^n = (x^(n/2))^2
If n is odd: x^n = x × x^(n-1)

This recursive structure reduces the problem size by half at each step, giving us O(log n) time complexity. For n = 1,000,000,000, that’s roughly 30 operations instead of a billion.

Let’s trace through 3^13 to see how this works. The binary representation of 13 is 1101, which means:

13 = 8 + 4 + 1 = 2^3 + 2^2 + 2^0

Therefore: 3^13 = 3^8 × 3^4 × 3^1

We compute powers of 3 by successive squaring:

3^1 = 3
3^2 = 9
3^4 = 81
3^8 = 6561

Then multiply the ones corresponding to set bits: 6561 × 81 × 3 = 1,594,323.

Here’s the basic recursive implementation without modular arithmetic:

def power(base: int, exp: int) -> int:
    if exp == 0:
        return 1
    if exp % 2 == 1:
        return base * power(base, exp - 1)
    half = power(base, exp // 2)
    return half * half

long long power(long long base, long long exp) {
    if (exp == 0) return 1;
    if (exp % 2 == 1) return base * power(base, exp - 1);
    long long half = power(base, exp / 2);
    return half * half;
}

Note the critical detail: we compute half once and multiply it by itself. Writing power(base, exp // 2) * power(base, exp // 2) would make two recursive calls, destroying our logarithmic complexity.

Adding Modular Arithmetic

For cryptographic applications and competitive programming, we need results modulo some number m. Without this, computing 2^1000000007 would require storing a number with over 300 million digits.

The mathematical property that saves us:

(a × b) mod m = ((a mod m) × (b mod m)) mod m

This means we can take the modulus after every multiplication, keeping intermediate results bounded by m^2. For a 64-bit modulus, intermediate products fit comfortably in 128 bits (or we can use techniques to avoid even that).

def mod_pow(base: int, exp: int, mod: int) -> int:
    if mod == 1:
        return 0
    result = 1
    base = base % mod
    while exp > 0:
        if exp % 2 == 1:
            result = (result * base) % mod
        exp = exp // 2
        base = (base * base) % mod
    return result

long long mod_pow(long long base, long long exp, long long mod) {
    if (mod == 1) return 0;
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) {
            result = (result * base) % mod;
        }
        exp >>= 1;
        base = (base * base) % mod;
    }
    return result;
}

The algorithm processes the exponent bit by bit from least significant to most significant. When a bit is set, we multiply the result by the current power of the base. After each iteration, we square the base (moving to the next power of 2) and shift the exponent right.

Iterative vs. Recursive Implementation

Both approaches have the same time complexity, but they differ in practical characteristics.

# Recursive implementation
def mod_pow_recursive(base: int, exp: int, mod: int) -> int:
    if mod == 1:
        return 0
    if exp == 0:
        return 1
    base = base % mod
    if exp % 2 == 1:
        return (base * mod_pow_recursive(base, exp - 1, mod)) % mod
    half = mod_pow_recursive(base, exp // 2, mod)
    return (half * half) % mod

# Iterative implementation
def mod_pow_iterative(base: int, exp: int, mod: int) -> int:
    if mod == 1:
        return 0
    result = 1
    base = base % mod
    while exp > 0:
        if exp & 1:
            result = (result * base) % mod
        exp >>= 1
        base = (base * base) % mod
    return result

The recursive version mirrors the mathematical definition more closely. It’s easier to prove correct and often clearer to developers unfamiliar with bit manipulation. However, it consumes O(log n) stack space and incurs function call overhead.

The iterative version is my default recommendation. It uses O(1) extra space, avoids potential stack overflow (Python’s default recursion limit is 1000, which fails for exponents above 2^1000), and runs slightly faster due to eliminated function call overhead.

Use the recursive version when teaching the algorithm or when you need to extend it in ways that benefit from recursion. Use the iterative version in production code.

Practical Applications

Modular Multiplicative Inverse

Fermat’s Little Theorem states that for prime p and integer a not divisible by p:

a^(p-1) ≡ 1 (mod p)

Therefore: a^(p-2) ≡ a^(-1) (mod p)

This gives us division in modular arithmetic:

def mod_inverse(a: int, prime: int) -> int:
    """Compute modular multiplicative inverse using Fermat's little theorem."""
    return mod_pow(a, prime - 2, prime)

def mod_divide(a: int, b: int, prime: int) -> int:
    """Compute (a / b) mod prime."""
    return (a * mod_inverse(b, prime)) % prime

# Example: (10 / 2) mod 7 = 5
# Verification: (5 * 2) mod 7 = 10 mod 7 = 3, and 10 mod 7 = 3 ✓
print(mod_divide(10, 2, 7))  # Output: 5

RSA-Style Encryption Demo

RSA’s security relies on the difficulty of factoring large numbers, but its mechanics are straightforward modular exponentiation:

def simple_rsa_demo():
    # Small primes for demonstration (real RSA uses 1024+ bit primes)
    p, q = 61, 53
    n = p * q  # 3233
    phi = (p - 1) * (q - 1)  # 3120
    
    e = 17  # Public exponent (must be coprime with phi)
    d = mod_inverse(e, phi)  # Private exponent: 2753
    
    message = 42
    
    # Encryption: ciphertext = message^e mod n
    ciphertext = mod_pow(message, e, n)
    print(f"Encrypted: {ciphertext}")  # 2557
    
    # Decryption: plaintext = ciphertext^d mod n
    plaintext = mod_pow(ciphertext, d, n)
    print(f"Decrypted: {plaintext}")  # 42

simple_rsa_demo()

Edge Cases and Pitfalls

Handle these cases explicitly or your implementation will fail in production:

Exponent = 0: Any non-zero base raised to power 0 equals 1. The code handles this naturally since the while loop never executes.

Modulus = 1: Everything is congruent to 0 modulo 1. Return 0 immediately.

Negative exponents: The basic algorithm doesn’t support these. For modular arithmetic, compute the modular inverse first: a^(-n) mod m = (a^(-1))^n mod m.

Overflow in intermediate multiplication: When mod is close to 2^63, the product base * base can overflow 64-bit integers. Solutions include:

// C++: Use __int128 for intermediate products
long long safe_mod_mul(long long a, long long b, long long mod) {
    return ((__int128)a * b) % mod;
}

// Alternative: Use compiler built-ins or binary multiplication
long long mod_mul(long long a, long long b, long long mod) {
    long long result = 0;
    a %= mod;
    while (b > 0) {
        if (b & 1) result = (result + a) % mod;
        a = (a * 2) % mod;
        b >>= 1;
    }
    return result;
}

Python handles arbitrary precision automatically, so overflow isn’t a concern there.

Performance Benchmarks and Conclusion

Quick benchmarks on computing 2^10000000 mod (10^9 + 7):

Implementation	Time
Naive (Python)	>10 minutes (estimated)
Binary exp (Python)	0.00002 seconds
Binary exp (C++)	0.000001 seconds

The difference isn’t marginal—it’s the difference between “runs instantly” and “never finishes.”

Reach for modular exponentiation whenever you’re computing large powers, especially with prime moduli common in competitive programming (10^9 + 7, 998244353). Use the iterative implementation by default. Remember that Python’s built-in pow(base, exp, mod) already implements this algorithm efficiently—use it instead of rolling your own in production Python code.

The algorithm is one of those fundamental tools that, once internalized, you’ll reach for constantly. It’s simple enough to implement from memory in any language, yet powerful enough to underpin modern cryptography.