Alex states the ultimate bottleneck of performance is based on the throughput of the CPU's L2 cache.Īlex did attempt to build a multi-threaded version but was unable to find any performance improvement over the single-threaded version. The application produced 64 bytes of FizzBuzz for every 4 CPU clock cycles. The lines number is corrected after every 512 bytes of output has been produced. Some calculations have also been hard-coded into the bytecode in a way not dissimilar to how JIT compilers operate.ĭuring each 600-line generation, an approximation of the line number is also produced. ![]() ![]() Each 32 bytes of bytecode can be interpreted and have their output stored with just 4 CPU instructions. There is a bytecode generator that produces batches of 600 lines at a time using SIMD instructions. In this post, I'll examine some of the optimisations found in the fastest FizzBuzz implementation to date.Īnd OUTPUT_PTR, -(2 << 20) // rewind to the start of the buffer Alex is a reserve on the UK Olympic Maths Team and has a degree in electronic engineering. The developer behind the Assembler version is Alex Smith, a doctoral researcher studying for a PhD in the School of Computer Science at the University of Birmingham in the UK. Submissions are benchmarked on Omer's computer which has a 16-core, 32-thread AMD 5950x CPU running with a base clock of 3.4 GHz and a boost clock of upwards of 4.9 GHz, 8 MB of L2 cache and 32 GB of 3.6 GHz DDR4 RAM.Īs of this writing, the 3rd fastest submission is written in Rust and produces out at a rate of 3 GB/s, 2nd is written in C and produces at a rate of 41 GB/s and the fastest is written in Assembler and produces at a rate of 56 GB/s. ![]() A year ago, Omer Tuchfeld started a coding contest to see who could write the fastest version of FizzBuzz on Stack Exchange's Code Golf site.
0 Comments
Leave a Reply. |