I'm having trouble finding any informations on partial neon register dependencies.
Take for example the following code:
ld2 {v0.16b, v1.16b}[0], [x0] ld2 {v0.16b, v1.16b}[1], [x1] ld2 {v0.16b, v1.16b}[2], [x2] ...
Does the second load have to wait…