1brc - same tricks across languages
· 3 min read
The 1 Billion Row Challenge (1BRC) is a programming challenge focused on processing a large dataset of temperature measurements. If you're unfamiliar with it, you can learn more from these resources: 1 and 2.
This is a cheatsheet of optimisations done for 1brc challenges. It tries to summarise and put the optimisations in perspective.
Data encoding / parsing
Trick | Outcome | Note |
---|---|---|
converting contents of file as string: don't! | ~10% perf | Don't convert the contents of the file toString . Simply process raw bytes. |
parsing float values | parsing after assuming only one place after decimal. | |
parsing integer | Branchless Programming via Bit Manipulation this is not generalizable | |
parsing the city name (finding; separator) | If the data looks likeTokio;13.4 and we want to find ; using 8 operations, you can find ; .SWAR = SIMD as a register |
Reading files from disk
Trick | Outcome | Note |
---|---|---|
Reading in chunksgolang | ~10% perf | |
mmap | mmap the input file. Split input based on core count and run it with a thread per core. mmap is an unsafe operation in Rust. |
Float handling
Trick | Outcome | Notes |
---|---|---|
Don't do floating point operations. Use int. | 20% speed gains | |
Use fixed size integers instead of float | ~10% gain in golang |
The Hashmap - simpler hash function
Trick | Outcome | Note |
---|---|---|
Fixed Size hash table 10k | ~40% gain - the custom hash table cuts down the time from 41.3 seconds to 22.1s. | How to resolve collisions? - Find first unoccupied slot - if hash collide, goto next empty slot algorithm |
Hash key - Integer, not String | 4x gain in a different competition | the time is spent inHashmap.get .Not in 1brc, but in other context, Hashmap.get was the bottleneck. hence this optimisation makes a lot of difference there, not in 1brc. |
With OS threads and parallelism - go brr!
Trick | Outcome | Notes |
---|---|---|
address the go routine + rust tokio task + erlang processes? | ~ 4-6x gains | biggest gains are obtained by utilizing OS threads fully! |