MVCC Visibility Rules
MVCC Visibility Rules Demonstration
Illustrates how transactions see data based on snapshot isolation. Focus on T3's reads and how its snapshot (taken at t=40) affects what it sees from T1 and T2, even after they commit.
Illustrates how transactions see data based on snapshot isolation. Focus on T3's reads and how its snapshot (taken at t=40) affects what it sees from T1 and T2, even after they commit.
A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction rolls back, the data read by the first transaction becomes invalid—hence, "dirty."
This can lead to inconsistent or incorrect results in applications, especially in systems where multiple users or processes are accessing and modifying the data concurrently.
This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.
Index of the four part series:
What challenges in a multi-threaded event loop force us to think of async runtimes like Tokio?
Why do we need async runtimes like Tokio?
Here’s a Markdown table that maps real-life reverse proxy scenarios to recommended TCP tuning parameters for optimal performance and security:
Parameter | Purpose |
---|---|
tcp_fin_timeout | How long to keep closing connection in FIN state. |
tcp_keepalive_time | Idle time before sending first keep-alive probe. |
tcp_keepalive_intvl | Interval between successive keep-alive probes. |
tcp_keepalive_probes | Number of probes before dropping connection. |
tcp_retries2 | Max TCP retransmissions before giving up. |
There are several strategies to obtain structured outputs from LLMs.
In Python, libraries such as Pydantic and Instructor facilitate structured output via JSON schema-based tool invocation. If you have the capability to host your own model, sglang is a viable option.
Pydantic validators are highly effective, provided that the input is in the form of a valid JSON string.
Let's see by example. For starters, here is the schema we want to parse.
from pydantic import BaseModel, ValidationError
class User(BaseModel):
id: int
name: str
email: str
active: bool = True # default value
# JSON representation of the data
json_data = '''
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''
try:
# Directly validate and parse the JSON string
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())
This works. Pydantic has a pretty solid json to data model convertor. But it has to be a valid json string. Let's explore further.
# JSON representation of the data
# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''
try:
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(json_data)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())
Error is:
Validation Error: [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: expected value at line 2 column 1","input":"\nHere is your json\n{\n \"id\": 123,\n \"name\": \"Alice\",\n \"email\": \"alice@example.com\"\n}\n","ctx":{"error":"expected value at line 2 column 1"},"url":"https://errors.pydantic.dev/2.10/v/json_invalid"}]
Now, let's add one more step in the mix. Let's use the json_partial_py library to parse the JSON string. and then pass it to pydantic.
from json_partial_py import to_json_string # <---- this is a new import
# typical replies of a small LLM which does not adhere well to 'output_json' command
json_data = '''
Here is your json
{
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
'''
try:
stringified_json = to_json_string(json_data)
# Directly validate and parse the JSON string using the new method
user = User.model_validate_json(stringified_json)
print("Validated Data:", user)
except ValidationError as e:
print("Validation Error:", e.json())
and voila!! Now you can rest assured that you will get clean json parsed from the LLM output.
P.S. I am author of the json_partial_py
library. It was extracted from baml project.
String interning is a technique that ensures each unique string is stored only once in memory, reducing redundancy and improving performance.
String interning is critical in performance-sensitive applications where redundant string storage leads to memory overhead and slower lookups. Here’s why it matters in real-world scenarios:
String interning reduces memory usage and improves performance by storing each unique string only once.
The following table highlights some real-world use cases where string interning is beneficial:
Use Case | Example |
---|---|
Compiler and Interpreter Optimization | Rust’s compiler, Python’s CPython, and Java’s JVM use string interning to optimize symbol tables. |
Embedded Systems & IoT | Logging frameworks like defmt use interning to minimize flash storage usage on microcontrollers. |
Web Servers & API Performance | High-traffic APIs handling millions of requests often receive the same strings (e.g., headers, JSON keys). |
Databases & Search Engines | Search engines like Elasticsearch and databases like PostgreSQL intern frequently queried strings. |
InternedString::new()
returns an empty interned string.InternedString::from(s)
interns a string, ensuring uniqueness.as_str()
, len()
, and is_empty()
mimic String
methods.Lazy<Mutex<HashMap>>
.use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use once_cell::sync::Lazy; // Ensures thread-safe global interner
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct InternedString {
inner: Arc<str>,
}
// Global interner (singleton)
static INTERNER: Lazy<Mutex<HashMap<Arc<str>, ()>>> = Lazy::new(|| Mutex::new(HashMap::new()));
impl InternedString {
/// Creates an empty interned string (similar to `String::new()`)
pub fn new() -> Self {
Self::intern("")
}
/// Interns a given string and returns an InternedString
pub fn from<S: AsRef<str>>(s: S) -> Self {
Self::intern(s.as_ref())
}
/// Returns a reference to the interned string
pub fn as_str(&self) -> &str {
&self.inner
}
/// Returns the length of the interned string
pub fn len(&self) -> usize {
self.inner.len()
}
/// Checks if the interned string is empty
pub fn is_empty(&self) -> bool {
self.inner.is_empty()
}
/// Interns a string, ensuring no duplicates exist
fn intern(s: &str) -> Self {
let mut interner = INTERNER.lock().unwrap();
// If the string is already interned, return the existing reference
if let Some(existing) = interner.keys().find(|k| k.as_ref() == s) {
return InternedString {
inner: Arc::clone(existing),
};
}
// Otherwise, intern the new string
let arc_str = Arc::from(s);
interner.insert(Arc::clone(&arc_str), ());
InternedString { inner: arc_str }
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_empty_string() {
let empty1 = InternedString::new();
let empty2 = InternedString::new();
assert_eq!(empty1, empty2);
assert!(empty1.is_empty());
}
#[test]
fn test_interning() {
let s1 = InternedString::from("hello");
let s2 = InternedString::from("hello");
let s3 = InternedString::from("world");
assert_eq!(s1, s2);
assert_ne!(s1, s3);
}
#[test]
fn test_string_length() {
let s = InternedString::from("test");
assert_eq!(s.len(), 4);
}
#[test]
fn test_string_content() {
let s = InternedString::from("RustLang");
assert_eq!(s.as_str(), "RustLang");
}
}
&str
and AsRef<OsStr>
The change from:
pub fn load_extension(&self, path: &str) -> Result<()>
to:
pub fn load_extension<P: AsRef<std::ffi::OsStr>>(&self, path: P) -> Result<()>
improves flexibility and usability. The original function only accepted &str
, requiring explicit conversion for types like String
, PathBuf
, or Path
. The updated version uses a generic parameter P
with the AsRef<std::ffi::OsStr>
trait, allowing it to accept any type that can be referenced as an OsStr
, such as &str
, String
, Path
, or PathBuf
.
Original Implementation:
use std::path::Path;
let path_str = String::from("/some/path");
let path_ref = Path::new("/another/path");
// Example 1: Using String
instance.load_extension(path_str);
// Example 2: Using &Path
instance.load_extension(&path_ref);
// Example 3: Using Path directly
instance.load_extension(Path::new("/yet/another/path"));
This reduces boilerplate and improves compatibility with different path types.
Neat bash script example.
We'll learn about >&2
and pushd
and popd
in this script.
HTTP responses can be quite large and memory consumption can be a concern. In some cases, it is important to be able to handle large responses without loading the entire response into memory.
One such scenario is when you want to download a large file from a server. If you were to load the entire file into memory, it would require a large amount of memory and would be inefficient. Instead, you can use a streaming approach to download the file directly to disk.
This example will show you how to do just that using the reqwest
and tokio
crates (Rust). Here is the rough flow.
Flattening nested data structures is a common problem in programming. However, flattening structures with an arbitrary depth—like nested Vec
s within Vec
s—can be tricky. Rust, with its strong type system and trait-based polymorphism, allows us to implement elegant solutions to such problems. In this post, we'll explore a recursive approach to deep flattening in Rust using traits, type inference, and iterators.
Given a deeply nested structure, such as:
let nested_vec = vec![
vec![vec![1, 2, 3], vec![4, 5]],
vec![vec![6], vec![7, 8, 9]],
];
Our goal is to flatten it into:
let flattened = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];