Livelock: Active But Non-Progressing Threads
Livelock is one of the more insidious concurrency bugs you'll encounter. While deadlock freezes your application in an obvious way, livelock keeps everything running—just not productively.
Key Insights
- Livelock occurs when threads are actively executing but making no forward progress—unlike deadlock where threads are blocked, livelocked threads consume CPU while accomplishing nothing
- The most common cause is overly “polite” resource sharing where threads repeatedly yield to each other, often triggered by retry logic without proper backoff strategies
- Breaking symmetry through randomized delays, priority-based ordering, or retry limits is the primary technique for preventing and resolving livelock conditions
What Is Livelock?
Livelock is one of the more insidious concurrency bugs you’ll encounter. While deadlock freezes your application in an obvious way, livelock keeps everything running—just not productively.
Picture two people meeting in a narrow hallway. Person A steps left to let Person B pass. Person B, being equally polite, steps right (their left). They’re now still blocking each other. Both step the other direction. And again. And again. Neither is stuck—they’re both actively moving—but neither makes progress toward their destination.
This is livelock in a nutshell: continuous activity without forward progress.
In software, livelocked threads keep executing instructions, responding to events, and consuming CPU cycles. Your monitoring shows healthy thread counts. Your health checks pass. But your actual work queue grows indefinitely because nothing is completing.
Livelock vs. Deadlock: Understanding the Difference
The distinction matters for both detection and resolution.
Deadlock occurs when threads are blocked, each waiting for a resource held by another. The threads consume no CPU—they’re suspended by the scheduler. Detection is relatively straightforward: look for threads in BLOCKED or WAITING states that aren’t progressing.
Livelock occurs when threads are runnable and actively executing, but their actions cancel each other out. CPU usage spikes while throughput drops to zero. Detection is harder because the threads look healthy by conventional metrics.
Here’s a side-by-side comparison:
// DEADLOCK: Both threads block forever
public class DeadlockExample {
private final Object lockA = new Object();
private final Object lockB = new Object();
public void thread1() {
synchronized (lockA) {
sleep(100); // Ensure thread2 grabs lockB
synchronized (lockB) {
System.out.println("Thread 1: acquired both locks");
}
}
}
public void thread2() {
synchronized (lockB) {
sleep(100); // Ensure thread1 grabs lockA
synchronized (lockA) {
System.out.println("Thread 2: acquired both locks");
}
}
}
}
// LIVELOCK: Both threads run forever, accomplishing nothing
public class LivelockExample {
private final AtomicBoolean resourceA = new AtomicBoolean(false);
private final AtomicBoolean resourceB = new AtomicBoolean(false);
public void thread1() {
while (true) {
if (resourceA.compareAndSet(false, true)) {
if (resourceB.compareAndSet(false, true)) {
System.out.println("Thread 1: acquired both resources");
resourceB.set(false);
resourceA.set(false);
return;
}
// "Politely" release A since we couldn't get B
resourceA.set(false);
}
// Immediately retry - no backoff
}
}
public void thread2() {
while (true) {
if (resourceB.compareAndSet(false, true)) {
if (resourceA.compareAndSet(false, true)) {
System.out.println("Thread 2: acquired both resources");
resourceA.set(false);
resourceB.set(false);
return;
}
// "Politely" release B since we couldn't get A
resourceB.set(false);
}
// Immediately retry - no backoff
}
}
}
In the deadlock example, a thread dump shows both threads blocked. In the livelock example, both threads show as RUNNABLE, burning CPU in their retry loops.
Common Causes and Patterns
Livelock typically emerges from well-intentioned code. The most frequent causes:
Retry logic without backoff. When an operation fails, immediately retrying at full speed can create contention storms. If multiple threads hit the same conflict and all retry instantly, they’ll likely conflict again.
Overly polite resource sharing. Code that releases resources when it detects contention—without any delay or priority mechanism—often creates symmetric behavior where all parties keep deferring to each other.
Competing conflict resolution strategies. When two systems or threads use the same “back off and retry” logic, they can synchronize their retry patterns.
Here’s a realistic example—a money transfer operation that creates livelock:
public class BankAccount {
private final Lock lock = new ReentrantLock();
private final int id;
private int balance;
public BankAccount(int id, int initialBalance) {
this.id = id;
this.balance = initialBalance;
}
// PROBLEMATIC: Creates livelock under contention
public static void transferMoney(BankAccount from, BankAccount to, int amount) {
while (true) {
if (from.lock.tryLock()) {
try {
if (to.lock.tryLock()) {
try {
if (from.balance >= amount) {
from.balance -= amount;
to.balance += amount;
}
return; // Success
} finally {
to.lock.unlock();
}
}
} finally {
from.lock.unlock();
}
}
// Both locks not acquired - release and retry immediately
// Thread.yield(); // Even yielding doesn't help much
}
}
}
When Thread 1 calls transferMoney(accountA, accountB, 100) while Thread 2 calls transferMoney(accountB, accountA, 50), they can enter a livelock: each acquires their first lock, fails to acquire the second, releases, and retries—perfectly synchronized in their failure.
Real-World Scenarios
Livelock appears in several common contexts:
Distributed systems with conflict resolution. Two nodes detect a data conflict and both decide to back off and let the other proceed. With symmetric retry logic, they can ping-pong indefinitely.
Network protocol collisions. Ethernet’s CSMA/CD (Carrier Sense Multiple Access with Collision Detection) originally suffered from this. Two stations detect a collision, both back off, both retry at the same moment, collide again. The solution was randomized exponential backoff.
Database transaction retries. Optimistic concurrency control detects conflicts at commit time. If two transactions repeatedly conflict and both immediately retry, they can livelock. Most databases implement backoff, but application-level retry logic often doesn’t.
Message queue consumers. Multiple consumers grab the same message type, fail validation, return it to the queue, and immediately grab it again—while the valid messages pile up behind.
Detection Strategies
Detecting livelock requires measuring progress, not just activity. Here’s a practical approach:
public class ProgressTracker {
private final AtomicLong operationsStarted = new AtomicLong(0);
private final AtomicLong operationsCompleted = new AtomicLong(0);
private final AtomicLong lastCompletedSnapshot = new AtomicLong(0);
private final AtomicLong stagnantIntervals = new AtomicLong(0);
public void recordStart() {
operationsStarted.incrementAndGet();
}
public void recordCompletion() {
operationsCompleted.incrementAndGet();
}
// Call this periodically (e.g., every 5 seconds)
public LivelockStatus checkProgress() {
long started = operationsStarted.get();
long completed = operationsCompleted.get();
long previousCompleted = lastCompletedSnapshot.getAndSet(completed);
long pending = started - completed;
long recentCompletions = completed - previousCompleted;
if (pending > 0 && recentCompletions == 0) {
long stagnant = stagnantIntervals.incrementAndGet();
if (stagnant >= 3) {
return new LivelockStatus(true, pending, stagnant);
}
} else {
stagnantIntervals.set(0);
}
return new LivelockStatus(false, pending, 0);
}
public record LivelockStatus(boolean suspected, long pendingOps, long stagnantIntervals) {}
}
Key detection signals:
- High CPU usage with low throughput
- Growing queue depths with active consumers
- Thread dumps showing RUNNABLE threads in retry loops
- Operations that start but never complete
Prevention and Resolution Techniques
The fundamental fix for livelock is breaking symmetry. When threads behave identically, they can synchronize in their failure. Introduce asymmetry, and one will eventually “win.”
Here’s the money transfer example, fixed:
public class BankAccount {
private final Lock lock = new ReentrantLock();
private final int id;
private int balance;
private static final int MAX_RETRIES = 10;
private static final ThreadLocalRandom random = ThreadLocalRandom.current();
public static boolean transferMoney(BankAccount from, BankAccount to, int amount)
throws InterruptedException {
for (int attempt = 0; attempt < MAX_RETRIES; attempt++) {
if (from.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
try {
if (to.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
try {
if (from.balance >= amount) {
from.balance -= amount;
to.balance += amount;
return true;
}
return false; // Insufficient funds
} finally {
to.lock.unlock();
}
}
} finally {
from.lock.unlock();
}
}
// Randomized exponential backoff
int baseDelay = (int) Math.pow(2, attempt);
int jitter = random.nextInt(baseDelay + 1);
Thread.sleep(baseDelay + jitter);
}
throw new RuntimeException("Transfer failed after " + MAX_RETRIES + " attempts");
}
}
The key improvements:
- Randomized backoff: Each thread waits a random duration, making synchronized retries unlikely
- Exponential increase: Backoff grows with each attempt, reducing contention over time
- Retry limits: Fail definitively rather than loop forever
- Timed lock acquisition:
tryLockwith timeout prevents indefinite waiting
Another approach is consistent ordering. Always acquire locks in a defined order (e.g., by account ID):
public static boolean transferMoneyOrdered(BankAccount from, BankAccount to, int amount) {
BankAccount first = from.id < to.id ? from : to;
BankAccount second = from.id < to.id ? to : from;
synchronized (first.lock) {
synchronized (second.lock) {
if (from.balance >= amount) {
from.balance -= amount;
to.balance += amount;
return true;
}
return false;
}
}
}
This eliminates livelock entirely by ensuring threads never hold conflicting lock orderings.
Key Takeaways
When to suspect livelock:
- CPU is high but work isn’t completing
- Threads are RUNNABLE but queues are growing
- Retry metrics are elevated without corresponding successes
Prevention checklist:
- Always use backoff in retry loops (exponential + jitter)
- Set maximum retry limits
- Consider consistent resource ordering to eliminate conflicts
- Monitor progress metrics, not just activity metrics
Resolution approaches:
- Add randomized delays to break synchronization
- Implement priority-based tie-breaking
- Reduce concurrency temporarily to let one thread complete
- Review and fix symmetric “polite” resource handling
Livelock is subtle because it masquerades as a healthy system. Your threads are running, your health checks pass, but nothing gets done. Build progress tracking into your systems, and you’ll catch it before your users do.