I am trying to understand how can we rewrite optimized multithreading for ARM architecture. Any suggestions will be of great help.