Cortex-A53: structure of ALU


How can I know the exact structure of ALU in the A53 ?

For example:

How many mnemonics like: vmlaq_f32,  vsubq_f32, vaddq_f32 can work in parallel ?

Can I run store,load of float32x4 in parallel to vsubq_f32, vaddq_f32 ?

How many 128bits registers can I use ?

