Understanding Eventloops (Tokio Internals)

Prelude

This is the first post in a four part series that will provide an understanding of the mechanics behind the Tokio runtime in Rust. This post focuses on the challenges in a multi-threaded event loop that force us to think of async runtimes like Tokio.

Index of the four part series:

Visualizing Tokio Internals: Part I - Multi-Threaded Event Loop / Server

Visualizing Tokio Internals: Part II - Reactor

Visualizing Tokio Internals: Part III - Wakers

Visualizing Tokio Internals: Part IV - Executors

Phase 0: The Problem

Learning Objective

After reading this you will be able to answer:

Why do we need async runtimes like Tokio?

Resource Efficiency: Traditional thread-per-connection models waste system resources
Scalability: Async enables handling thousands of connections with minimal overhead
Performance: Event-driven architecture reduces context switching and memory usage
Cost-Effective: Better resource utilization means lower infrastructure costs

Modern applications, especially network services, need to handle many things concurrently. Imagine a web server handling thousands of client connections simultaneously.

A naive approach is to dedicate one Operating System (OS) thread to each connection. Let's see why this doesn't scale well.

The Thread-Per-Connection Resource Drain

The visualization below shows resource consumption (CPU/Memory) and throughput limits of a blocking thread-per-connection model.

Description:

Imagine a dashboard resembling htop or Task Manager:

CPU Usage: Bars representing individual CPU cores.
Memory Usage: A single bar showing total RAM consumption.
Active Threads: A counter or list showing running OS threads.
Requests/Second: A throughput meter.
Incoming Requests Queue: A visual queue of pending connections.

Simulation:

Start: The server starts. CPU/Memory usage is low. Throughput is 0. Few base threads exist.
Low Load: Simulate a few incoming connections (~10). For each, a new OS thread is created.
- Visual: Active Threads count increases slightly. Memory usage ticks up slightly. CPU usage might blip as threads start but stays relatively low if connections are mostly idle. Throughput matches the request rate.
High Load: Simulate hundreds or thousands of incoming connections. Many connections involve waiting for network I/O (reading request body, waiting for database, sending response).
- Visual:
  - Active Threads: The count explodes. Each thread requires kernel resources and its own stack (~MBs).
  - Memory Usage: The Memory bar shoots up dramatically, potentially hitting system limits.
  - CPU Usage: CPU bars likely thrash. Even if threads are mostly waiting (blocked on I/O), the OS spends significant time context switching between them. This is overhead, not useful work.
  - Requests Queue: The incoming requests queue grows rapidly because threads are created, but many quickly block on I/O. The server struggles to accept new connections.
  - Requests/Second: The throughput meter hits a plateau far below the incoming request rate, possibly even decreasing as context-switching overhead dominates.

htop - Thread-Per-Connection Server

Active Connections:10

Threads:12

Requests/sec:0

Request Queue:0

CPU Usage

CPU 0

CPU 1

CPU 2

CPU 3

CPU 4

CPU 5

CPU 6

CPU 7

Average:

Memory Usage

Mem

Top Threads

PID

USER

CPU%

MEM%

CMD

1000

server

7.0

0.4

http-conn

1001

server

18.0

0.3

http-conn

1002

server

20.0

0.5

http-conn

1003

server

13.0

0.4

http-conn

1004

server

17.0

0.5

http-conn

1005

server

9.0

0.5

http-conn

1006

server

7.0

0.5

http-conn

1007

server

2.0

0.2

http-conn

1008

server

7.0

0.2

http-conn

1009

server

10.0

0.5

http-conn

... and 2 more threads

Thread-Per-Connection Resource Monitor

Simulate Connection Load

Simulation Speed

SlowFast1x

Performance Impact

100

200

300

400

500

Resource Utilization (%)

CPU Usage

Memory Usage

Thread Count

Throughput

Request Queue

System Status:

System is handling connections efficiently. Resources are well-utilized with minimal overhead.

Figure 1: Interactive visualization of thread-per-connection scaling issues. As connection count increases, resources are consumed by thread overhead, while throughput plateaus and then declines due to context switching costs.

Insight

We need a way to handle multiple waiting tasks concurrently without needing a dedicated OS thread for each one while it's waiting. This leads to asynchronous programming.

Prelude​

Multi-Threaded Event Loop / Server​

Phase 0: The Problem​

The Thread-Per-Connection Resource Drain​

Thread-Per-Connection Resource Monitor

Performance Impact

System Status:

Prelude

Multi-Threaded Event Loop / Server

Phase 0: The Problem

The Thread-Per-Connection Resource Drain