Skip to main content
Abhishek Tripathi
Curiosity brings awareness.
View all authors

1 Million Rows, 10 Columns, One Table, No Indexes

· 6 min read
Abhishek Tripathi
Curiosity brings awareness.

Original tweet that inspired this post.

GreptimeDB solves the high-cardinality problem by treating millions of time-series as rows in a flat columnar table, using Arrow for fast writes and Parquet for efficient storage.

WAL on S3 — Why Object Storage Simplifies Database Design

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

WAL on S3: Simplicity over Complexity

WAL on S3: What You Get for Free

When you build a database on local disk, the WAL (Write-Ahead Log) carries a heavy burden. It must guarantee that committed transactions survive crashes — every write hits the WAL before it touches data files. If the machine loses power mid-write, the WAL is your only path back to a consistent state.

Two hard problems come with building a WAL on local disk:

The Day I Taught Claude Code to Work Without Me (And How You Can Too)

· 7 min read
Abhishek Tripathi
Curiosity brings awareness.
🧃 TL;DR
  • Problem: You're trapped micromanaging Claude Code instead of building
  • Solution: Train Claude Code once, then let it work autonomously
  • How it Works: Strategic context engineering: CLAUDE.md files, systematic workflows, and learning capture
  • Benefit: Claude Code becomes your autonomous teammate

Here is CLAUDE.md

DB-010: Dual Anomalies

· One min read
Abhishek Tripathi
Curiosity brings awareness.

Combined Anomaly: Dirty Read Leads to Lost Update

A more complex scenario with 3 transactions on a non-MVCC system. It shows how T2 performing a 'Dirty Read' on T1's uncommitted data directly enables the 'Lost Update' anomaly when T1 later aborts.

Step: 0/10 | Time: 0
Next TxID: 1

Database State (No Versions)

Current Operation

None

Transaction States & Undo Logs

No active transactions.

Transaction Timelines

0
10
20
30
40
50
60
70
80
90
100

T1

B
W
A

T2

B
R
W
C

T3

B
R
C

DB-008: Phantom Read

· One min read
Abhishek Tripathi
Curiosity brings awareness.

Transaction Timeline: Phantom Read Anomaly

T1 scans for 'Engineering' employees. T2 inserts and commits a new 'Engineering' employee. T1's subsequent identical scan finds this new 'phantom' row.

Step: 0 / 7
Time: 0

Committed Database State

employees

idnamedepartmentsalary
1AliceEngineering90000
2BobSales75000
4DavidEngineering85000

Uncommitted Inserts

None

Scan Log

No scans yet

Current Operation

None

Transaction Timelines

0
10
20
30
40
50
60

T1

B
S
S
C

T2

B
I
C

Legend

B
Begin
S
Scan (Range Query)
I
Insert
C
Commit
A
Abort

Understanding Phantom Reads

A Phantom Read occurs when a transaction executes a range query (e.g., "find all employees in Engineering") twice, and the set of rows returned by the second query is different from the first. This happens because another transaction has inserted or deleted rows that match the query's criteria and committed those changes in between the two queries of the first transaction. The newly appeared (or disappeared) rows are called "phantoms."

Key Observation Points:

  • Step 2: T1's first scan of 'Engineering' employees finds Alice and David.
  • Step 4 & 5: T2 inserts 'Charlie' into 'Engineering' and commits this change.
  • Step 6: T1's second scan of 'Engineering' employees now finds Alice, David, AND 'Charlie'. 'Charlie' is the phantom row.
  • This anomaly can lead to inconsistencies if T1 was performing calculations or making decisions based on the assumption that the set of rows matching its criteria would not change during its execution.
  • Phantom reads are typically prevented by higher isolation levels like Serializable, or by specific locking strategies (e.g., predicate locks, though not widely implemented, or careful range locking).

DB-007: MVCC Visibility Rules

· One min read
Abhishek Tripathi
Curiosity brings awareness.

MVCC Visibility Rules Demonstration

Illustrates how transactions see data based on snapshot isolation. Focus on T3's reads and how its snapshot (taken at t=40) affects what it sees from T1 and T2, even after they commit.

Step: 0/14 | Time: 0
Next TxID: 1 | Committed: [0]

Data Item Versions (Global State)

Current Operation

None

Transaction States & Snapshots

No active transactions.

Transaction Timelines

0
10
20
30
40
50
60
70

T1

B
R
W
R
C

T2

B
R
W
R
C

T3

B
R
R
C

Legend

B
Begin
R
Read
W
Write
C
Commit
A
Abort

DB-001: Database Isolation (dirty reads)

· 2 min read
Abhishek Tripathi
Curiosity brings awareness.

Understanding Dirty Reads: A Database Isolation Problem

What is a Dirty Read?

A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction rolls back, the data read by the first transaction becomes invalid—hence, "dirty."

This can lead to inconsistent or incorrect results in applications, especially in systems where multiple users or processes are accessing and modifying the data concurrently.

Understanding Eventloops (Tokio Internals)

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Prelude

This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.

Index of the four part series:

  1. Visualizing Tokio Internals: Part I - Multi-Threaded Event Loop / Server
  2. Visualizing Tokio Internals: Part II - Reactor
  3. Visualizing Tokio Internals: Part III - Wakers
  4. Visualizing Tokio Internals: Part IV - Executors

Multi-Threaded Event Loop / Server

What challenges in a multi-threaded event loop force us to think of async runtimes like Tokio?

Phase 0: The Problem

Learning Objective

Why do we need async runtimes like Tokio?

  • Resource Efficiency: Traditional thread-per-connection models waste system resources
  • Scalability: Async enables handling thousands of connections with minimal overhead
  • Performance: Event-driven architecture reduces context switching and memory usage
  • Cost-Effective: Better resource utilization means lower infrastructure costs

Connection Pooling - in Depth

· 2 min read
Abhishek Tripathi
Curiosity brings awareness.

Here’s a Markdown table that maps real-life reverse proxy scenarios to recommended TCP tuning parameters for optimal performance and security:

Legend (Quick Reference)

ParameterPurpose
tcp_fin_timeoutHow long to keep closing connection in FIN state.
tcp_keepalive_timeIdle time before sending first keep-alive probe.
tcp_keepalive_intvlInterval between successive keep-alive probes.
tcp_keepalive_probesNumber of probes before dropping connection.
tcp_retries2Max TCP retransmissions before giving up.