1brc - same tricks across languages
· 3 min read
The 1 Billion Row Challenge (1BRC) is a programming challenge focused on processing a large dataset of temperature measurements. If you're unfamiliar with it, you can learn more from these resources: 1 and 2.
This is a cheatsheet of optimisations done for 1brc challenges. It tries to summarise and put the optimisations in perspective.
Data encoding / parsing
| Trick | Outcome | Note |
|---|---|---|
| converting contents of file as string: don't! | ~10% perf | Don't convert the contents of the file toString. Simply process raw bytes. |
| parsing float values | parsing after assuming only one place after decimal. | |
| parsing integer | Branchless Programming via Bit Manipulation this is not generalizable | |
parsing the city name (finding; separator) | If the data looks likeTokio;13.4 and we want to find ;using 8 operations, you can find ;.SWAR = SIMD as a register |
Reading files from disk
| Trick | Outcome | Note |
|---|---|---|
| Reading in chunksgolang | ~10% perf | |
| mmap | mmap the input file. Split input based on core count and run it with a thread per core. mmap is an unsafe operation in Rust. |
Float handling
| Trick | Outcome | Notes |
|---|---|---|
| Don't do floating point operations. Use int. | 20% speed gains | |
| Use fixed size integers instead of float | ~10% gain in golang |
The Hashmap - simpler hash function
| Trick | Outcome | Note |
|---|---|---|
| Fixed Size hash table 10k | ~40% gain - the custom hash table cuts down the time from 41.3 seconds to 22.1s. | How to resolve collisions? - Find first unoccupied slot - if hash collide, goto next empty slot algorithm |
| Hash key - Integer, not String | 4x gain in a different competition | the time is spent inHashmap.get.Not in 1brc, but in other context, Hashmap.get was the bottleneck. hence this optimisation makes a lot of difference there, not in 1brc. |
With OS threads and parallelism - go brr!
| Trick | Outcome | Notes |
|---|---|---|
| address the go routine + rust tokio task + erlang processes? | ~ 4-6x gains | biggest gains are obtained by utilizing OS threads fully! |
