Understanding Eventloops (Tokio Internals)
Prelude
This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.
Index of the four part series:
- Visualizing Tokio Internals: Part I - Multi-Threaded Event Loop / Server
- Visualizing Tokio Internals: Part II - Reactor
- Visualizing Tokio Internals: Part III - Wakers
- Visualizing Tokio Internals: Part IV - Executors
Multi-Threaded Event Loop / Server
What challenges in a multi-threaded event loop force us to think of async runtimes like Tokio?
Phase 0: The Problem
Why do we need async runtimes like Tokio?
- Resource Efficiency: Traditional thread-per-connection models waste system resources
- Scalability: Async enables handling thousands of connections with minimal overhead
- Performance: Event-driven architecture reduces context switching and memory usage
- Cost-Effective: Better resource utilization means lower infrastructure costs
Modern applications, especially network services, need to handle many things concurrently. Imagine a web server handling thousands of client connections simultaneously.
A naive approach is to dedicate one Operating System (OS) thread to each connection. Let's see why this doesn't scale well.
The Thread-Per-Connection Resource Drain
The visualization below shows resource consumption (CPU/Memory) and throughput limits of a blocking thread-per-connection model.
Description:
Imagine a dashboard resembling htop
or Task Manager:
- CPU Usage: Bars representing individual CPU cores.
- Memory Usage: A single bar showing total RAM consumption.
- Active Threads: A counter or list showing running OS threads.
- Requests/Second: A throughput meter.
- Incoming Requests Queue: A visual queue of pending connections.
Simulation:
- Start: The server starts. CPU/Memory usage is low. Throughput is 0. Few base threads exist.
- Low Load: Simulate a few incoming connections (~10). For each, a new OS thread is created.
- Visual: Active Threads count increases slightly. Memory usage ticks up slightly. CPU usage might blip as threads start but stays relatively low if connections are mostly idle. Throughput matches the request rate.
- High Load: Simulate hundreds or thousands of incoming connections. Many connections involve waiting for network I/O (reading request body, waiting for database, sending response).
- Visual:
- Active Threads: The count explodes. Each thread requires kernel resources and its own stack (~MBs).
- Memory Usage: The Memory bar shoots up dramatically, potentially hitting system limits.
- CPU Usage: CPU bars likely thrash. Even if threads are mostly waiting (blocked on I/O), the OS spends significant time context switching between them. This is overhead, not useful work.
- Requests Queue: The incoming requests queue grows rapidly because threads are created, but many quickly block on I/O. The server struggles to accept new connections.
- Requests/Second: The throughput meter hits a plateau far below the incoming request rate, possibly even decreasing as context-switching overhead dominates.
- Visual:
Thread-Per-Connection Resource Monitor
Performance Impact
System Status:
System is handling connections efficiently. Resources are well-utilized with minimal overhead.
We need a way to handle multiple waiting tasks concurrently without needing a dedicated OS thread for each one while it's waiting. This leads to asynchronous programming.