String interning in Rust
· 3 min read
What is String Interning?
String interning is a technique that ensures each unique string is stored only once in memory, reducing redundancy and improving performance.
Why String Interning Matters in Real-World Scenarios
String interning is critical in performance-sensitive applications where redundant string storage leads to memory overhead and slower lookups. Here’s why it matters in real-world scenarios:
Real-World Use Cases
String interning reduces memory usage and improves performance by storing each unique string only once.
The following table highlights some real-world use cases where string interning is beneficial:
Use Case | Example |
---|---|
Compiler and Interpreter Optimization | Rust’s compiler, Python’s CPython, and Java’s JVM use string interning to optimize symbol tables. |
Embedded Systems & IoT | Logging frameworks like defmt use interning to minimize flash storage usage on microcontrollers. |
Web Servers & API Performance | High-traffic APIs handling millions of requests often receive the same strings (e.g., headers, JSON keys). |
Databases & Search Engines | Search engines like Elasticsearch and databases like PostgreSQL intern frequently queried strings. |
How It Works
InternedString::new()
returns an empty interned string.InternedString::from(s)
interns a string, ensuring uniqueness.as_str()
,len()
, andis_empty()
mimicString
methods.- A global HashMap stores interned strings, avoiding duplicates.
- Thread safety is ensured using
Lazy<Mutex<HashMap>>
.
Implementation of String Interning in Rust
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use once_cell::sync::Lazy; // Ensures thread-safe global interner
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct InternedString {
inner: Arc<str>,
}
// Global interner (singleton)
static INTERNER: Lazy<Mutex<HashMap<Arc<str>, ()>>> = Lazy::new(|| Mutex::new(HashMap::new()));
impl InternedString {
/// Creates an empty interned string (similar to `String::new()`)
pub fn new() -> Self {
Self::intern("")
}
/// Interns a given string and returns an InternedString
pub fn from<S: AsRef<str>>(s: S) -> Self {
Self::intern(s.as_ref())
}
/// Returns a reference to the interned string
pub fn as_str(&self) -> &str {
&self.inner
}
/// Returns the length of the interned string
pub fn len(&self) -> usize {
self.inner.len()
}
/// Checks if the interned string is empty
pub fn is_empty(&self) -> bool {
self.inner.is_empty()
}
/// Interns a string, ensuring no duplicates exist
fn intern(s: &str) -> Self {
let mut interner = INTERNER.lock().unwrap();
// If the string is already interned, return the existing reference
if let Some(existing) = interner.keys().find(|k| k.as_ref() == s) {
return InternedString {
inner: Arc::clone(existing),
};
}
// Otherwise, intern the new string
let arc_str = Arc::from(s);
interner.insert(Arc::clone(&arc_str), ());
InternedString { inner: arc_str }
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_empty_string() {
let empty1 = InternedString::new();
let empty2 = InternedString::new();
assert_eq!(empty1, empty2);
assert!(empty1.is_empty());
}
#[test]
fn test_interning() {
let s1 = InternedString::from("hello");
let s2 = InternedString::from("hello");
let s3 = InternedString::from("world");
assert_eq!(s1, s2);
assert_ne!(s1, s3);
}
#[test]
fn test_string_length() {
let s = InternedString::from("test");
assert_eq!(s.len(), 4);
}
#[test]
fn test_string_content() {
let s = InternedString::from("RustLang");
assert_eq!(s.as_str(), "RustLang");
}
}