As part of my MSc Scientific Computing at UCL, I've been doing some Linpack benchmarking of an 8 node Raspberry Pi 4 Model B cluster. I have posted some preliminary results in the README.md at:
These results are not the usual "Problem Size vs Gflops". I know in advance that I want to use a problem size utilising 80% of memory, so they are "NB vs Gflops". I'm trying to determine the optimum NB for a given problem size. I'm particularly interested in this because I want to make optimum use of the limited networking resources, whilst maintaining efficient load balancing.
Are these results reasonable? Any suggestions for further investigation? All comments/feedback would be most welcome.
Please don't take too much notice of the multi-node results at the moment, I know I have some NET_RX softirq issues to resolve through NET_RX interrupt coalescing and NIC receive buffer increases. Through some initial experiemtation, I know I can gain an additional 10Gflops across 8 nodes.
View all questions in HPC forum