Skip to main content

Reliable Structured Outputs with LLMs

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

Ensuring Deterministic Outputs from LLMs

There are several strategies to obtain structured outputs from LLMs.

In Python, libraries such as Pydantic and Instructor facilitate structured output via JSON schema-based tool invocation. If you have the capability to host your own model, sglang is a viable option.

Pydantic validators are highly effective, provided that the input is in the form of a valid JSON string.

Let's see by example. For starters, here is the schema we want to parse.

from pydantic import BaseModel, ValidationError

class User(BaseModel):
id: int
name: str
email: str
active: bool = True # default value

# JSON representation of the data
json_data = '''
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
# Directly validate and parse the JSON string
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())

This works. Pydantic has a pretty solid json to data model convertor. But it has to be a valid json string. Let's explore further.


# JSON representation of the data
# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())


Error is:

Validation Error: [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: expected value at line 2 column 1","input":"\nHere is your json\n{\n    \"id\": 123,\n    \"name\": \"Alice\",\n    \"email\": \"alice@example.com\"\n}\n","ctx":{"error":"expected value at line 2 column 1"},"url":"https://errors.pydantic.dev/2.10/v/json_invalid"}]

Now, let's add one more step in the mix. Let's use the json_partial_py library to parse the JSON string. and then pass it to pydantic.


from json_partial_py import to_json_string # <---- this is a new import

# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''

try:
stringified_json = to_json_string(json_data)
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(stringified_json)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())


and voila!! Now you can rest assured that you will get clean json parsed from the LLM output.

P.S. I am author of the json_partial_py library. It was extracted from baml project.

String interning in Rust

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

What is String Interning?

String interning is a technique that ensures each unique string is stored only once in memory, reducing redundancy and improving performance.

Why String Interning Matters in Real-World Scenarios

String interning is critical in performance-sensitive applications where redundant string storage leads to memory overhead and slower lookups. Here’s why it matters in real-world scenarios:

Real-World Use Cases

String interning reduces memory usage and improves performance by storing each unique string only once.

The following table highlights some real-world use cases where string interning is beneficial:

Use CaseExample
Compiler and Interpreter OptimizationRust’s compiler, Python’s CPython, and Java’s JVM use string interning to optimize symbol tables.
Embedded Systems & IoTLogging frameworks like defmt use interning to minimize flash storage usage on microcontrollers.
Web Servers & API PerformanceHigh-traffic APIs handling millions of requests often receive the same strings (e.g., headers, JSON keys).
Databases & Search EnginesSearch engines like Elasticsearch and databases like PostgreSQL intern frequently queried strings.

How It Works

  • InternedString::new() returns an empty interned string.
  • InternedString::from(s) interns a string, ensuring uniqueness.
  • as_str(), len(), and is_empty() mimic String methods.
  • A global HashMap stores interned strings, avoiding duplicates.
  • Thread safety is ensured using Lazy<Mutex<HashMap>>.

Implementation of String Interning in Rust

use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use once_cell::sync::Lazy; // Ensures thread-safe global interner

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct InternedString {
inner: Arc<str>,
}

// Global interner (singleton)
static INTERNER: Lazy<Mutex<HashMap<Arc<str>, ()>>> = Lazy::new(|| Mutex::new(HashMap::new()));

impl InternedString {
/// Creates an empty interned string (similar to `String::new()`)
pub fn new() -> Self {
Self::intern("")
}

/// Interns a given string and returns an InternedString
pub fn from<S: AsRef<str>>(s: S) -> Self {
Self::intern(s.as_ref())
}

/// Returns a reference to the interned string
pub fn as_str(&self) -> &str {
&self.inner
}

/// Returns the length of the interned string
pub fn len(&self) -> usize {
self.inner.len()
}

/// Checks if the interned string is empty
pub fn is_empty(&self) -> bool {
self.inner.is_empty()
}

/// Interns a string, ensuring no duplicates exist
fn intern(s: &str) -> Self {
let mut interner = INTERNER.lock().unwrap();

// If the string is already interned, return the existing reference
if let Some(existing) = interner.keys().find(|k| k.as_ref() == s) {
return InternedString {
inner: Arc::clone(existing),
};
}

// Otherwise, intern the new string
let arc_str = Arc::from(s);
interner.insert(Arc::clone(&arc_str), ());

InternedString { inner: arc_str }
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_empty_string() {
let empty1 = InternedString::new();
let empty2 = InternedString::new();
assert_eq!(empty1, empty2);
assert!(empty1.is_empty());
}

#[test]
fn test_interning() {
let s1 = InternedString::from("hello");
let s2 = InternedString::from("hello");
let s3 = InternedString::from("world");

assert_eq!(s1, s2);
assert_ne!(s1, s3);
}

#[test]
fn test_string_length() {
let s = InternedString::from("test");
assert_eq!(s.len(), 4);
}

#[test]
fn test_string_content() {
let s = InternedString::from("RustLang");
assert_eq!(s.as_str(), "RustLang");
}
}

Rust tricks for the average developer (me)

· One min read
Abhishek Tripathi
Curiosity brings awareness.

001 : &str and AsRef<OsStr>

The change from:

pub fn load_extension(&self, path: &str) -> Result<()>

to:

pub fn load_extension<P: AsRef<std::ffi::OsStr>>(&self, path: P) -> Result<()>

improves flexibility and usability. The original function only accepted &str, requiring explicit conversion for types like String, PathBuf, or Path. The updated version uses a generic parameter P with the AsRef<std::ffi::OsStr> trait, allowing it to accept any type that can be referenced as an OsStr, such as &str, String, Path, or PathBuf.

Original Implementation:

use std::path::Path;

let path_str = String::from("/some/path");
let path_ref = Path::new("/another/path");

// Example 1: Using String
instance.load_extension(path_str);

// Example 2: Using &Path
instance.load_extension(&path_ref);

// Example 3: Using Path directly
instance.load_extension(Path::new("/yet/another/path"));

This reduces boilerplate and improves compatibility with different path types.

Streaming HTTP to Disk

· 4 min read
Abhishek Tripathi
Curiosity brings awareness.

HTTP responses can be quite large and memory consumption can be a concern. In some cases, it is important to be able to handle large responses without loading the entire response into memory.

One such scenario is when you want to download a large file from a server. If you were to load the entire file into memory, it would require a large amount of memory and would be inefficient. Instead, you can use a streaming approach to download the file directly to disk.

This example will show you how to do just that using the reqwest and tokio crates (Rust). Here is the rough flow.

Deep Flattening in Rust - Using Recursive Types

· 4 min read
Abhishek Tripathi
Curiosity brings awareness.

Deep Flattening in Rust: A Recursive Adventure

Flattening nested data structures is a common problem in programming. However, flattening structures with an arbitrary depth—like nested Vecs within Vecs—can be tricky. Rust, with its strong type system and trait-based polymorphism, allows us to implement elegant solutions to such problems. In this post, we'll explore a recursive approach to deep flattening in Rust using traits, type inference, and iterators.

The Goal

Given a deeply nested structure, such as:

let nested_vec = vec![
vec![vec![1, 2, 3], vec![4, 5]],
vec![vec![6], vec![7, 8, 9]],
];

Our goal is to flatten it into:

let flattened = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];

From TCP Sockets to Unix Sockets: A Caddy Performance Case Study

· 5 min read
Abhishek Tripathi
Curiosity brings awareness.

A recent GitHub issue #6751 in the Caddy server repository revealed a counterintuitive performance bottleneck: despite maintaining low CPU usage (1-5%), a multi-layer reverse proxy setup experienced severe throughput degradation. This investigation uncovered a critical lesson—low CPU usage doesn't guarantee performance. The culprit? Network stack overhead hiding beneath the surface. Here's what was discovered and how it was resolved.

1brc - same tricks across languages

· 3 min read
Abhishek Tripathi
Curiosity brings awareness.

The 1 Billion Row Challenge (1BRC) is a programming challenge focused on processing a large dataset of temperature measurements. If you're unfamiliar with it, you can learn more from these resources: 1 and 2.

This is a cheatsheet of optimisations done for 1brc challenges. It tries to summarise and put the optimisations in perspective.