DB-008: Phantom Read

June 7, 2025 · One min read

Curiosity brings awareness.

Transaction Timeline: Phantom Read Anomaly

T1 scans for 'Engineering' employees. T2 inserts and commits a new 'Engineering' employee. T1's subsequent identical scan finds this new 'phantom' row.

Step: 0 / 7

Time: 0

Committed Database State

employees

id	name	department	salary
1	Alice	Engineering	90000
2	Bob	Sales	75000
4	David	Engineering	85000

Uncommitted Inserts

None

Scan Log

No scans yet

Current Operation

None

Transaction Timelines

T1

T2

Legend

Begin

Scan (Range Query)

Insert

Commit

Abort

Understanding Phantom Reads

A Phantom Read occurs when a transaction executes a range query (e.g., "find all employees in Engineering") twice, and the set of rows returned by the second query is different from the first. This happens because another transaction has inserted or deleted rows that match the query's criteria and committed those changes in between the two queries of the first transaction. The newly appeared (or disappeared) rows are called "phantoms."

Key Observation Points:

Step 2: T1's first scan of 'Engineering' employees finds Alice and David.
Step 4 & 5: T2 inserts 'Charlie' into 'Engineering' and commits this change.
Step 6: T1's second scan of 'Engineering' employees now finds Alice, David, AND 'Charlie'. 'Charlie' is the phantom row.
This anomaly can lead to inconsistencies if T1 was performing calculations or making decisions based on the assumption that the set of rows matching its criteria would not change during its execution.
Phantom reads are typically prevented by higher isolation levels like Serializable, or by specific locking strategies (e.g., predicate locks, though not widely implemented, or careful range locking).