Skip to main content

6 posts tagged with "Databases"

Database systems

View All Tags

1 Million Rows, 10 Columns, One Table, No Indexes

· 6 min read
Abhishek Tripathi
Curiosity brings awareness.

Original tweet that inspired this post.

GreptimeDB solves the high-cardinality problem by treating millions of time-series as rows in a flat columnar table, using Arrow for fast writes and Parquet for efficient storage.

WAL on S3 — Why Object Storage Simplifies Database Design

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

WAL on S3: Simplicity over Complexity

WAL on S3: What You Get for Free

When you build a database on local disk, the WAL (Write-Ahead Log) carries a heavy burden. It must guarantee that committed transactions survive crashes — every write hits the WAL before it touches data files. If the machine loses power mid-write, the WAL is your only path back to a consistent state.

Two hard problems come with building a WAL on local disk:

DB-010: Dual Anomalies

· One min read
Abhishek Tripathi
Curiosity brings awareness.

Combined Anomaly: Dirty Read Leads to Lost Update

A more complex scenario with 3 transactions on a non-MVCC system. It shows how T2 performing a 'Dirty Read' on T1's uncommitted data directly enables the 'Lost Update' anomaly when T1 later aborts.

Step: 0/10 | Time: 0
Next TxID: 1

Database State (No Versions)

Current Operation

None

Transaction States & Undo Logs

No active transactions.

Transaction Timelines

0
10
20
30
40
50
60
70
80
90
100

T1

B
W
A

T2

B
R
W
C

T3

B
R
C

DB-008: Phantom Read

· One min read
Abhishek Tripathi
Curiosity brings awareness.

Transaction Timeline: Phantom Read Anomaly

T1 scans for 'Engineering' employees. T2 inserts and commits a new 'Engineering' employee. T1's subsequent identical scan finds this new 'phantom' row.

Step: 0 / 7
Time: 0

Committed Database State

employees

idnamedepartmentsalary
1AliceEngineering90000
2BobSales75000
4DavidEngineering85000

Uncommitted Inserts

None

Scan Log

No scans yet

Current Operation

None

Transaction Timelines

0
10
20
30
40
50
60

T1

B
S
S
C

T2

B
I
C

Legend

B
Begin
S
Scan (Range Query)
I
Insert
C
Commit
A
Abort

Understanding Phantom Reads

A Phantom Read occurs when a transaction executes a range query (e.g., "find all employees in Engineering") twice, and the set of rows returned by the second query is different from the first. This happens because another transaction has inserted or deleted rows that match the query's criteria and committed those changes in between the two queries of the first transaction. The newly appeared (or disappeared) rows are called "phantoms."

Key Observation Points:

  • Step 2: T1's first scan of 'Engineering' employees finds Alice and David.
  • Step 4 & 5: T2 inserts 'Charlie' into 'Engineering' and commits this change.
  • Step 6: T1's second scan of 'Engineering' employees now finds Alice, David, AND 'Charlie'. 'Charlie' is the phantom row.
  • This anomaly can lead to inconsistencies if T1 was performing calculations or making decisions based on the assumption that the set of rows matching its criteria would not change during its execution.
  • Phantom reads are typically prevented by higher isolation levels like Serializable, or by specific locking strategies (e.g., predicate locks, though not widely implemented, or careful range locking).

DB-007: MVCC Visibility Rules

· One min read
Abhishek Tripathi
Curiosity brings awareness.

MVCC Visibility Rules Demonstration

Illustrates how transactions see data based on snapshot isolation. Focus on T3's reads and how its snapshot (taken at t=40) affects what it sees from T1 and T2, even after they commit.

Step: 0/14 | Time: 0
Next TxID: 1 | Committed: [0]

Data Item Versions (Global State)

Current Operation

None

Transaction States & Snapshots

No active transactions.

Transaction Timelines

0
10
20
30
40
50
60
70

T1

B
R
W
R
C

T2

B
R
W
R
C

T3

B
R
R
C

Legend

B
Begin
R
Read
W
Write
C
Commit
A
Abort

DB-001: Database Isolation (dirty reads)

· 2 min read
Abhishek Tripathi
Curiosity brings awareness.

Understanding Dirty Reads: A Database Isolation Problem

What is a Dirty Read?

A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction rolls back, the data read by the first transaction becomes invalid—hence, "dirty."

This can lead to inconsistent or incorrect results in applications, especially in systems where multiple users or processes are accessing and modifying the data concurrently.