12 Recovery
Week 12: April 25 - May 8, 2022
- COWS Section 16.7 and Chapter 18
- Recovery on Gradescope
Why do we need recovery? To ensure database consistency, transaction atomicity, and durability even in the face of failures, we need recovery algorithms.
There are many reasons for failures:
- Transaction Failures: These occur due to logical errors for example when a transaction cannot complete due to an internal error such as an integrity constraint violation or for example when a DBMS terminates an active transaction due to deadlocks.
- System Failures: These can either be due to software (a poor DBMS implementation) or hardware such as a server crashing (e.g. an operators turns off the wrong server for maintenance).
- Storage Failures: Disk failure does occur albeit very rarely. Note that recovery doesn’t protect against these failures. However, replication and distributed database systems can protect against if we can restore the database from an archived version or active replica.
12.1 The Basics of a Recovery Algorithm
Every recovery algorithm has two parts:
- Actions during normal transaction processing to ensure that the DBMS can recover from a failure.
- Actions after a failure to to recover the database to a state that ensures atomicity, consistency, and durability.
The key primitives of most recovery algorithms are UNDO
and REDO
Not all algorithms use both of these:
-
UNDO
The process of removing the effects of an incomplete or aborted transaction. -
REDO
The process of re-instating the effects of a committed transaction for durability.
Our choice of primitives to use depends largely on the buffer management policies we use and we need logging protocols to provide these functions.
STEAL policy: STEAL means a transaction can write uncommitted changes to disk and NO-STEAL means it cannot.
FORCE policy: All updates made by a transaction are reflected on disk when the transaction commits.
The easiest system to implement is NO-STEAL + FORCE. Because uncommitted changes are never written to disk (NO-STEAL), the DBMS doesn’t need to undo changes of aborted/failed transactions. No redos are required because all changes are guaranteed to be on disk at commit (FORCE).
It does limit the scalability of the system as transaction cannot STEAL, i.e. write uncommitted changes to disk and they all have to be in memory. So the size of a transaction is limited by available memory!
12.2 Write-Ahead Logging
The DBMS records all the changes made to the database in a log file (on stable storage) before the change is made to a disk page. The log contains sufficient information to perform the necessary undo and redo actions to restore the database after a crash. This is an example of a STEAL + NO-FORCE system.
Almost every DBMS uses write-ahead logging (WAL) because it has the fastest runtime performance compared to other recovery algorithms.
For example, another recovery technique is shadow paging keeps two separate copies of the database: master and shadow. Updates are made on the shadow copy. On commit, the DBMS atomically switches the shadow to become the new master. This is an example of a NO-STEAL + FORCE system. It has a much faster recovery time than WAL but it has a much slower runtime performance.
12.3 ARIES Protocol
Algorithms for Recovery and Isolation Exploiting Semantics (ARIES) was developed at IBM research, by C.Mohan, in the early 90s. C. Mohan is a frequent NYUAD visitor, watch out for his next visit/talk. ARIES has three central ideas that we have touched on already:
- WAL (STEAL + NO-FORCE)
- Repeat history during redo: on restart, replay actions to restore database to exact state before crash
- Log changes during undo: record undo actions to log so that in the case of repeated failures, you do no repeat actions already taken.
During Normal Execution, ARIES requires the following on transaction commit:
- write
COMMIT
record to log buffer in memory. - flush all log records up to and including the transaction’s
COMMIT
record to disk. - return an ack to the application, once the
COMMIT
record is safely stored on disk - At some later point, write a special
END
record to log. This indicates that the transaction is completely finished in the system and there will not be anymore log records for it. TheseEND
records are used for internal bookkeeping and do not need to be flushed immediately.
We will now explore why ARIES arrives at its design choices (checkpointing, replaying the history as is from the log, and using Compensation Log Records (CLRs) to avoid repeated undos) by trying to work around the flaws of naive recovery algorithm that plays the log from beginning to end and works backwards to undo failed transactions post crash. In this process, you will appreciate why ARIES keep track of different information such as the transaction table and dirty page table in memory and other information in the log.
The best way to learn is by example. Here is a link to an ARIES simulator where you can edit the operations to understand what happens during the different phases of the ARIES protocol.
To summarize the ARIES protocol:
First, aborting a transaction is a special case of the ARIES undo operation applied to only one transaction.
Two new additions to the log are required to support Aborts:
Previous log sequence number (prevLSN): The DBMS uses these prevLSNs to maintain a linked-list for each transaction that makes it easier to walk through the log to find its records.
Compensation log record (CLR): A CLR describes the actions taken to undo the actions of a previous update record. It has all the fields of an update log record plus the undoNext pointer (i.e., the next-to-be-undone LSN).
To abort a transaction, the DBMS:
- first appends a
ABORT
record to the log buffer in memory - undoes the transaction’s updates in reverse order to remove their effects from the database.
- For each undone update, the DBMS creates a CLR entry in the log and restores the old value.
- After all of the aborted transaction’s updates are reversed, writes a
END
log record.
Second, the ARIES recovery protocol proceeds in three phases on starting up from a crash:
- Analysis: Read the WAL to identify dirty pages in the buffer pool and active transactions at the time of the crash.
- Redo: Repeat all actions starting from an appropriate point in the log.
- Undo: Reverse the actions of transactions that did not commit before the crash.
Finally, to prevent the log file from growing forever, which makes recovery painfully slow on crashes as the DBMS has to replay the entire log, checkpoints or periodically flushing all buffers, appending a CHECKPOINT
record to the log entry and flushing log records to disk trims the log such that replays can occur from the checkpoint.