Deadlock Recovery

Deadlock Recovery

When a deadlock is detected (e.g., via detection algorithms), the system must recover to resume normal operation

Two main strategies:

Process Termination

This is called victim selection. We aim to minimize the total recovery cost

Factor	Description
Priority	Is the process critical? (e.g., system vs user)
Work Done	How much has it already computed? Killing it wastes this work
Time to Finish	Is it almost done? Better to let it finish than restart
Resources Held	More resources held → more likely to help resolve deadlock
Resources Needed	If a process still needs many more, it’s riskier
Interactive or Batch	Killing interactive users harms UX more
# of processes affected	Is this process part of multiple deadlocks?

Goal: Choose the process whose termination frees resources but costs the least

Instead of terminating the process, we roll it back to an earlier safe state (i.e., a checkpoint).

Requires checkpoints to be taken periodically (e.g., using snapshotting or transaction logs)

If the same process is always selected as the victim, it may never finish ⇒ starvation

Include the number of rollbacks in the cost metric Processes that have already been rolled back many times should be less likely to be chosen again.

Deadlock is detected involving P1, P2, P3
Choose victim (say P2 has:
- low priority,
- long remaining time,
- holds lots of resources)
Kill P2
Release its resources
Re-run detection
If still deadlocked → choose another victim
Repeat until no cycles exist

Recovery Strategy	Description	Trade-off
Abort all	Kill all deadlocked processes	Simple but expensive
Abort one by one	Iteratively kill until deadlock breaks	Less loss, requires victim selection
Rollback	Restart process from checkpoint	Requires checkpointing overhead
Starvation handling	Factor rollback count into victim cost	Fairer long-term system behavior

Last updated on May 24, 2025