Skip to main content

DB-008: Phantom Read

· One min read
Abhishek Tripathi
Curiosity brings awareness.

Transaction Timeline: Phantom Read Anomaly

T1 scans for 'Engineering' employees. T2 inserts and commits a new 'Engineering' employee. T1's subsequent identical scan finds this new 'phantom' row.

Step: 0 / 7
Time: 0

Committed Database State

employees

idnamedepartmentsalary
1AliceEngineering90000
2BobSales75000
4DavidEngineering85000

Uncommitted Inserts

None

Scan Log

No scans yet

Current Operation

None

Transaction Timelines

0
10
20
30
40
50
60

T1

B
S
S
C

T2

B
I
C

Legend

B
Begin
S
Scan (Range Query)
I
Insert
C
Commit
A
Abort

Understanding Phantom Reads

A Phantom Read occurs when a transaction executes a range query (e.g., "find all employees in Engineering") twice, and the set of rows returned by the second query is different from the first. This happens because another transaction has inserted or deleted rows that match the query's criteria and committed those changes in between the two queries of the first transaction. The newly appeared (or disappeared) rows are called "phantoms."

Key Observation Points:

  • Step 2: T1's first scan of 'Engineering' employees finds Alice and David.
  • Step 4 & 5: T2 inserts 'Charlie' into 'Engineering' and commits this change.
  • Step 6: T1's second scan of 'Engineering' employees now finds Alice, David, AND 'Charlie'. 'Charlie' is the phantom row.
  • This anomaly can lead to inconsistencies if T1 was performing calculations or making decisions based on the assumption that the set of rows matching its criteria would not change during its execution.
  • Phantom reads are typically prevented by higher isolation levels like Serializable, or by specific locking strategies (e.g., predicate locks, though not widely implemented, or careful range locking).