Skip to main content
Abhishek Tripathi
Curiosity brings awareness.
View all authors

Database Isolation (dirty reads)

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Understanding Dirty Reads: A Database Isolation Problem

What is a Dirty Read?

A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction rolls back, the data read by the first transaction becomes invalid—hence, "dirty."

This can lead to inconsistent or incorrect results in applications, especially in systems where multiple users or processes are accessing and modifying the data concurrently.

Understanding Eventloops (Tokio Internals)

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Prelude

This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.

Index of the four part series:

  1. Visualizing Tokio Internals: Part I - Multi-Threaded Event Loop / Server
  2. Visualizing Tokio Internals: Part II - Reactor
  3. Visualizing Tokio Internals: Part III - Wakers
  4. Visualizing Tokio Internals: Part IV - Executors

Multi-Threaded Event Loop / Server

What challenges in a multi-threaded event loop force us to think of async runtimes like Tokio?

Phase 0: The Problem

Learning Objective
After reading this you will be able to answer:

Why do we need async runtimes like Tokio?

  • Resource Efficiency: Traditional thread-per-connection models waste system resources
  • Scalability: Async enables handling thousands of connections with minimal overhead
  • Performance: Event-driven architecture reduces context switching and memory usage
  • Cost-Effective: Better resource utilization means lower infrastructure costs

Connection Pooling - in Depth

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Here’s a Markdown table that maps real-life reverse proxy scenarios to recommended TCP tuning parameters for optimal performance and security:

Legend (Quick Reference)

ParameterPurpose
tcp_fin_timeoutHow long to keep closing connection in FIN state.
tcp_keepalive_timeIdle time before sending first keep-alive probe.
tcp_keepalive_intvlInterval between successive keep-alive probes.
tcp_keepalive_probesNumber of probes before dropping connection.
tcp_retries2Max TCP retransmissions before giving up.

Reliable Structured Outputs with LLMs

· 2 min read
Abhishek Tripathi
Curiosity brings awareness.

Ensuring Deterministic Outputs from LLMs

There are several strategies to obtain structured outputs from LLMs.

In Python, libraries such as Pydantic and Instructor facilitate structured output via JSON schema-based tool invocation. If you have the capability to host your own model, sglang is a viable option.

Pydantic validators are highly effective, provided that the input is in the form of a valid JSON string.

Let's see by example. For starters, here is the schema we want to parse.

from pydantic import BaseModel, ValidationError

class User(BaseModel):
id: int
name: str
email: str
active: bool = True # default value

# JSON representation of the data
json_data = '''
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
# Directly validate and parse the JSON string
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())

This works. Pydantic has a pretty solid json to data model convertor. But it has to be a valid json string. Let's explore further.


# JSON representation of the data
# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())


Error is:

Validation Error: [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: expected value at line 2 column 1","input":"\nHere is your json\n{\n    \"id\": 123,\n    \"name\": \"Alice\",\n    \"email\": \"alice@example.com\"\n}\n","ctx":{"error":"expected value at line 2 column 1"},"url":"https://errors.pydantic.dev/2.10/v/json_invalid"}]

Now, let's add one more step in the mix. Let's use the json_partial_py library to parse the JSON string. and then pass it to pydantic.


from json_partial_py import to_json_string # <---- this is a new import

# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
stringified_json = to_json_string(json_data)
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(stringified_json)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())


and voila!! Now you can rest assured that you will get clean json parsed from the LLM output.

P.S. I am author of the json_partial_py library. It was extracted from baml project.

String interning in Rust

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

What is String Interning?

String interning is a technique that ensures each unique string is stored only once in memory, reducing redundancy and improving performance.

Why String Interning Matters in Real-World Scenarios

String interning is critical in performance-sensitive applications where redundant string storage leads to memory overhead and slower lookups. Here’s why it matters in real-world scenarios:

Real-World Use Cases

String interning reduces memory usage and improves performance by storing each unique string only once.

The following table highlights some real-world use cases where string interning is beneficial:

Use CaseExample
Compiler and Interpreter OptimizationRust’s compiler, Python’s CPython, and Java’s JVM use string interning to optimize symbol tables.
Embedded Systems & IoTLogging frameworks like defmt use interning to minimize flash storage usage on microcontrollers.
Web Servers & API PerformanceHigh-traffic APIs handling millions of requests often receive the same strings (e.g., headers, JSON keys).
Databases & Search EnginesSearch engines like Elasticsearch and databases like PostgreSQL intern frequently queried strings.

How It Works

  • InternedString::new() returns an empty interned string.
  • InternedString::from(s) interns a string, ensuring uniqueness.
  • as_str(), len(), and is_empty() mimic String methods.
  • A global HashMap stores interned strings, avoiding duplicates.
  • Thread safety is ensured using Lazy<Mutex<HashMap>>.

Implementation of String Interning in Rust

use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use once_cell::sync::Lazy; // Ensures thread-safe global interner

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct InternedString {
inner: Arc<str>,
}

// Global interner (singleton)
static INTERNER: Lazy<Mutex<HashMap<Arc<str>, ()>>> = Lazy::new(|| Mutex::new(HashMap::new()));

impl InternedString {
/// Creates an empty interned string (similar to `String::new()`)
pub fn new() -> Self {
Self::intern("")
}

/// Interns a given string and returns an InternedString
pub fn from<S: AsRef<str>>(s: S) -> Self {
Self::intern(s.as_ref())
}

/// Returns a reference to the interned string
pub fn as_str(&self) -> &str {
&self.inner
}

/// Returns the length of the interned string
pub fn len(&self) -> usize {
self.inner.len()
}

/// Checks if the interned string is empty
pub fn is_empty(&self) -> bool {
self.inner.is_empty()
}

/// Interns a string, ensuring no duplicates exist
fn intern(s: &str) -> Self {
let mut interner = INTERNER.lock().unwrap();

// If the string is already interned, return the existing reference
if let Some(existing) = interner.keys().find(|k| k.as_ref() == s) {
return InternedString {
inner: Arc::clone(existing),
};
}

// Otherwise, intern the new string
let arc_str = Arc::from(s);
interner.insert(Arc::clone(&arc_str), ());

InternedString { inner: arc_str }
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_empty_string() {
let empty1 = InternedString::new();
let empty2 = InternedString::new();
assert_eq!(empty1, empty2);
assert!(empty1.is_empty());
}

#[test]
fn test_interning() {
let s1 = InternedString::from("hello");
let s2 = InternedString::from("hello");
let s3 = InternedString::from("world");

assert_eq!(s1, s2);
assert_ne!(s1, s3);
}

#[test]
fn test_string_length() {
let s = InternedString::from("test");
assert_eq!(s.len(), 4);
}

#[test]
fn test_string_content() {
let s = InternedString::from("RustLang");
assert_eq!(s.as_str(), "RustLang");
}
}

Rust tricks for the average developer (me)

· One min read
Abhishek Tripathi
Curiosity brings awareness.

001 : &str and AsRef<OsStr>

The change from:

pub fn load_extension(&self, path: &str) -> Result<()>

to:

pub fn load_extension<P: AsRef<std::ffi::OsStr>>(&self, path: P) -> Result<()>

improves flexibility and usability. The original function only accepted &str, requiring explicit conversion for types like String, PathBuf, or Path. The updated version uses a generic parameter P with the AsRef<std::ffi::OsStr> trait, allowing it to accept any type that can be referenced as an OsStr, such as &str, String, Path, or PathBuf.

Original Implementation:

use std::path::Path;

let path_str = String::from("/some/path");
let path_ref = Path::new("/another/path");

// Example 1: Using String
instance.load_extension(path_str);

// Example 2: Using &Path
instance.load_extension(&path_ref);

// Example 3: Using Path directly
instance.load_extension(Path::new("/yet/another/path"));

This reduces boilerplate and improves compatibility with different path types.

Streaming HTTP to Disk

· 4 min read
Abhishek Tripathi
Curiosity brings awareness.

HTTP responses can be quite large and memory consumption can be a concern. In some cases, it is important to be able to handle large responses without loading the entire response into memory.

One such scenario is when you want to download a large file from a server. If you were to load the entire file into memory, it would require a large amount of memory and would be inefficient. Instead, you can use a streaming approach to download the file directly to disk.

This example will show you how to do just that using the reqwest and tokio crates (Rust). Here is the rough flow.

Deep Flattening in Rust - Using Recursive Types

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Deep Flattening in Rust: A Recursive Adventure

Flattening nested data structures is a common problem in programming. However, flattening structures with an arbitrary depth—like nested Vecs within Vecs—can be tricky. Rust, with its strong type system and trait-based polymorphism, allows us to implement elegant solutions to such problems. In this post, we'll explore a recursive approach to deep flattening in Rust using traits, type inference, and iterators.

The Goal

Given a deeply nested structure, such as:

let nested_vec = vec![
vec![vec![1, 2, 3], vec![4, 5]],
vec![vec![6], vec![7, 8, 9]],
];

Our goal is to flatten it into:

let flattened = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];

1brc - same tricks across languages

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

The 1 Billion Row Challenge (1BRC) is a programming challenge focused on processing a large dataset of temperature measurements. If you're unfamiliar with it, you can learn more from these resources: 1 and 2.

This is a cheatsheet of optimisations done for 1brc challenges. It tries to summarise and put the optimisations in perspective.