Did you had a Acorn Archimedes ?

Vxxx.f32 s0, s1, s2 // Vpf 1 float operationVxxx.f64 d0, d1, d2 // Vpf 1 double operationVxxx.f32 d0, d1, d2 // NEON 2 float operationVxxx.f32 q0, q1, q2 // NEON 4 float operation

Do you mean that these instructions don't exist: VDIV.f32 d0, d1, d2 // NEON 2 float operationVDIV.f32 q0, q1, q2 // NEON 4 float operation

And these instruction exist:VADD.f32 d0, d1, d2 // NEON 2 float operationVADD.f32 q0, q1, q2 // NEON 4 float operation

Why is it not mentioned in the documentation? Does the divider use to much space on chip?

So you have to trade speed for accuracy?

vrecpe.f32 d1, d5 vrecps.f32 d2, d1, d5 vmul.f32 d1, d1, d2 vrecps.f32 d2, d1, d5 vmul.f32 d5, d1, d2