Abhishek Tripathi

Curiosity brings awareness.

View all authors

MVCC Visibility Rules

May 23, 2025 · One min read

Abhishek Tripathi

Curiosity brings awareness.

MVCC Visibility Rules Demonstration

Illustrates how transactions see data based on snapshot isolation. Focus on T3's reads and how its snapshot (taken at t=40) affects what it sees from T1 and T2, even after they commit.

Step: 0/14 | Time: 0

Next TxID: 1 | Committed: [0]

Data Item Versions (Global State)

Current Operation

None

Transaction States & Snapshots

No active transactions.

Transaction Timelines

T1

T2

T3

Legend

Begin

Read

Write

Commit

Abort

Database Isolation (dirty reads)

April 21, 2025 · 3 min read

Abhishek Tripathi

Curiosity brings awareness.

Understanding Dirty Reads: A Database Isolation Problem
- What is a Dirty Read?

Understanding Dirty Reads: A Database Isolation Problem

What is a Dirty Read?

A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction rolls back, the data read by the first transaction becomes invalid—hence, "dirty."

This can lead to inconsistent or incorrect results in applications, especially in systems where multiple users or processes are accessing and modifying the data concurrently.

Understanding Eventloops (Tokio Internals)

April 19, 2025 · 3 min read

Abhishek Tripathi

Curiosity brings awareness.

Prelude
Multi-Threaded Event Loop / Server
Phase 0: The Problem

Prelude

This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.

Index of the four part series:

Visualizing Tokio Internals: Part I - Multi-Threaded Event Loop / Server
Visualizing Tokio Internals: Part II - Reactor
Visualizing Tokio Internals: Part III - Wakers
Visualizing Tokio Internals: Part IV - Executors

Multi-Threaded Event Loop / Server

What challenges in a multi-threaded event loop force us to think of async runtimes like Tokio?

Phase 0: The Problem

Learning Objective

After reading this you will be able to answer:

Why do we need async runtimes like Tokio?

Resource Efficiency: Traditional thread-per-connection models waste system resources
Scalability: Async enables handling thousands of connections with minimal overhead
Performance: Event-driven architecture reduces context switching and memory usage
Cost-Effective: Better resource utilization means lower infrastructure costs

Connection Pooling - in Depth

March 13, 2025 · 3 min read

Abhishek Tripathi

Curiosity brings awareness.

Here’s a Markdown table that maps real-life reverse proxy scenarios to recommended TCP tuning parameters for optimal performance and security:

✅ Legend (Quick Reference)

Parameter	Purpose
`tcp_fin_timeout`	How long to keep closing connection in FIN state.
`tcp_keepalive_time`	Idle time before sending first keep-alive probe.
`tcp_keepalive_intvl`	Interval between successive keep-alive probes.
`tcp_keepalive_probes`	Number of probes before dropping connection.
`tcp_retries2`	Max TCP retransmissions before giving up.

Reliable Structured Outputs with LLMs

March 1, 2025 · 2 min read

Abhishek Tripathi

Curiosity brings awareness.

Ensuring Deterministic Outputs from LLMs

There are several strategies to obtain structured outputs from LLMs.

In Python, libraries such as Pydantic and Instructor facilitate structured output via JSON schema-based tool invocation. If you have the capability to host your own model, sglang is a viable option.

Pydantic validators are highly effective, provided that the input is in the form of a valid JSON string.

Let's see by example. For starters, here is the schema we want to parse.

from pydantic import BaseModel, ValidationError

class User(BaseModel):
    id: int
    name: str
    email: str
    active: bool = True  # default value

# JSON representation of the data
json_data = '''
{
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com"
}
'''

try:
     # Directly validate and parse the JSON string
    user = User.model_validate_json(json_data)
    print("Validated Data:", user)
except ValidationError as e:
    print("Validation Error:", e.json())

This works. Pydantic has a pretty solid json to data model convertor. But it has to be a valid json string. Let's explore further.

# JSON representation of the data
# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com"
}
'''

try:
     # Directly validate and parse the JSON string using the new method
    user = User.model_validate_json(json_data)
    print("Validated Data:", user)
except ValidationError as e:
    print("Validation Error:", e.json())

Error is:

Validation Error: [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: expected value at line 2 column 1","input":"\nHere is your json\n{\n    \"id\": 123,\n    \"name\": \"Alice\",\n    \"email\": \"alice@example.com\"\n}\n","ctx":{"error":"expected value at line 2 column 1"},"url":"https://errors.pydantic.dev/2.10/v/json_invalid"}]

Now, let's add one more step in the mix. Let's use the json_partial_py library to parse the JSON string. and then pass it to pydantic.

from json_partial_py import to_json_string # <---- this is a new import

# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com"
}
'''

try:
    stringified_json = to_json_string(json_data)
     # Directly validate and parse the JSON string using the new method
    user = User.model_validate_json(stringified_json)
    print("Validated Data:", user)
except ValidationError as e:
    print("Validation Error:", e.json())

and voila!! Now you can rest assured that you will get clean json parsed from the LLM output.

P.S. I am author of the json_partial_py library. It was extracted from baml project.

String interning in Rust

March 1, 2025 · 3 min read

Abhishek Tripathi

Curiosity brings awareness.

What is String Interning?

String interning is a technique that ensures each unique string is stored only once in memory, reducing redundancy and improving performance.

Why String Interning Matters in Real-World Scenarios

String interning is critical in performance-sensitive applications where redundant string storage leads to memory overhead and slower lookups. Here’s why it matters in real-world scenarios:

Real-World Use Cases

String interning reduces memory usage and improves performance by storing each unique string only once.

The following table highlights some real-world use cases where string interning is beneficial:

Use Case	Example
Compiler and Interpreter Optimization	Rust’s compiler, Python’s CPython, and Java’s JVM use string interning to optimize symbol tables.
Embedded Systems & IoT	Logging frameworks like `defmt` use interning to minimize flash storage usage on microcontrollers.
Web Servers & API Performance	High-traffic APIs handling millions of requests often receive the same strings (e.g., headers, JSON keys).
Databases & Search Engines	Search engines like Elasticsearch and databases like PostgreSQL intern frequently queried strings.

How It Works

InternedString::new() returns an empty interned string.
InternedString::from(s) interns a string, ensuring uniqueness.
as_str(), len(), and is_empty() mimic String methods.
A global HashMap stores interned strings, avoiding duplicates.
Thread safety is ensured using Lazy<Mutex<HashMap>>.

Implementation of String Interning in Rust

use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use once_cell::sync::Lazy; // Ensures thread-safe global interner

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct InternedString {
    inner: Arc<str>,
}

// Global interner (singleton)
static INTERNER: Lazy<Mutex<HashMap<Arc<str>, ()>>> = Lazy::new(|| Mutex::new(HashMap::new()));

impl InternedString {
    /// Creates an empty interned string (similar to `String::new()`)
    pub fn new() -> Self {
        Self::intern("")
    }

    /// Interns a given string and returns an InternedString
    pub fn from<S: AsRef<str>>(s: S) -> Self {
        Self::intern(s.as_ref())
    }

    /// Returns a reference to the interned string
    pub fn as_str(&self) -> &str {
        &self.inner
    }

    /// Returns the length of the interned string
    pub fn len(&self) -> usize {
        self.inner.len()
    }

    /// Checks if the interned string is empty
    pub fn is_empty(&self) -> bool {
        self.inner.is_empty()
    }

    /// Interns a string, ensuring no duplicates exist
    fn intern(s: &str) -> Self {
        let mut interner = INTERNER.lock().unwrap();

        // If the string is already interned, return the existing reference
        if let Some(existing) = interner.keys().find(|k| k.as_ref() == s) {
            return InternedString {
                inner: Arc::clone(existing),
            };
        }

        // Otherwise, intern the new string
        let arc_str = Arc::from(s);
        interner.insert(Arc::clone(&arc_str), ());
        
        InternedString { inner: arc_str }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty_string() {
        let empty1 = InternedString::new();
        let empty2 = InternedString::new();
        assert_eq!(empty1, empty2);
        assert!(empty1.is_empty());
    }

    #[test]
    fn test_interning() {
        let s1 = InternedString::from("hello");
        let s2 = InternedString::from("hello");
        let s3 = InternedString::from("world");

        assert_eq!(s1, s2);
        assert_ne!(s1, s3);
    }

    #[test]
    fn test_string_length() {
        let s = InternedString::from("test");
        assert_eq!(s.len(), 4);
    }

    #[test]
    fn test_string_content() {
        let s = InternedString::from("RustLang");
        assert_eq!(s.as_str(), "RustLang");
    }
}

Rust tricks for the average developer (me)

January 18, 2025 · One min read

Abhishek Tripathi

Curiosity brings awareness.

001 : `&str` and `AsRef<OsStr>`

The change from:

pub fn load_extension(&self, path: &str) -> Result<()>

to:

pub fn load_extension<P: AsRef<std::ffi::OsStr>>(&self, path: P) -> Result<()>

improves flexibility and usability. The original function only accepted &str, requiring explicit conversion for types like String, PathBuf, or Path. The updated version uses a generic parameter P with the AsRef<std::ffi::OsStr> trait, allowing it to accept any type that can be referenced as an OsStr, such as &str, String, Path, or PathBuf.

Original Implementation:

use std::path::Path;

let path_str = String::from("/some/path");
let path_ref = Path::new("/another/path");

// Example 1: Using String
instance.load_extension(path_str);

// Example 2: Using &Path
instance.load_extension(&path_ref);

// Example 3: Using Path directly
instance.load_extension(Path::new("/yet/another/path"));

This reduces boilerplate and improves compatibility with different path types.

Neat bash script

January 12, 2025 · 5 min read

Abhishek Tripathi

Curiosity brings awareness.

Neat bash script example. We'll learn about >&2 and pushd and popd in this script.

Streaming HTTP to Disk

January 10, 2025 · 4 min read

Abhishek Tripathi

Curiosity brings awareness.

HTTP responses can be quite large and memory consumption can be a concern. In some cases, it is important to be able to handle large responses without loading the entire response into memory.

One such scenario is when you want to download a large file from a server. If you were to load the entire file into memory, it would require a large amount of memory and would be inefficient. Instead, you can use a streaming approach to download the file directly to disk.

This example will show you how to do just that using the reqwest and tokio crates (Rust). Here is the rough flow.

Deep Flattening in Rust - Using Recursive Types

December 31, 2024 · 3 min read

Joel Medicala

Abhishek Tripathi

Curiosity brings awareness.

Deep Flattening in Rust: A Recursive Adventure

Flattening nested data structures is a common problem in programming. However, flattening structures with an arbitrary depth—like nested Vecs within Vecs—can be tricky. Rust, with its strong type system and trait-based polymorphism, allows us to implement elegant solutions to such problems. In this post, we'll explore a recursive approach to deep flattening in Rust using traits, type inference, and iterators.

The Goal

Given a deeply nested structure, such as:

let nested_vec = vec![
    vec![vec![1, 2, 3], vec![4, 5]],
    vec![vec![6], vec![7, 8, 9]],
];

Our goal is to flatten it into:

let flattened = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];

Data Item Versions (Global State)

Current Operation

Transaction States & Snapshots

Transaction Timelines

T1

T2

T3

Legend

Understanding Dirty Reads: A Database Isolation Problem​

What is a Dirty Read?​

Prelude​

Multi-Threaded Event Loop / Server​

Phase 0: The Problem​

✅ Legend (Quick Reference)​

Ensuring Deterministic Outputs from LLMs​

What is String Interning?​

Why String Interning Matters in Real-World Scenarios​

Real-World Use Cases​

How It Works​

Implementation of String Interning in Rust​

001 : &str and AsRef<OsStr>​

Deep Flattening in Rust: A Recursive Adventure​

The Goal​

Understanding Dirty Reads: A Database Isolation Problem

What is a Dirty Read?

Prelude

Multi-Threaded Event Loop / Server

Phase 0: The Problem

✅ Legend (Quick Reference)

Ensuring Deterministic Outputs from LLMs

What is String Interning?

Why String Interning Matters in Real-World Scenarios

Real-World Use Cases

How It Works

Implementation of String Interning in Rust

001 : `&str` and `AsRef<OsStr>`

Deep Flattening in Rust: A Recursive Adventure

The Goal