Hi,Faced the problem of not working the predicate in the intrinsic vdupq_x_n_f32 for the SSEE-300-MPS3 (Cortex-M55).Sample code:
mve_pred16_t p_0; float32x4_t TTT_f4; TTT_f4 = vdupq_n_f32(5.17); p_0 = vcmpltq_n_s32(TTT_f4, 0); TTT_f4 = vdupq_x_n_f32(12.69, p_0);
TTT_f4 is overwritten by 12.69, although it should not. Could you please help me understand why this happened?(I use the Arm Development Studio 2021.1)Regards,Yevhenii
Did you check the resulting assembly code?
Yes.The result is always 12.69.I have tried
p_0 = vcmpltq_n_f32(TTT_f4, 0);
and
p_0 = vcmpgtq_n_f32(TTT_f4, 0);
Hi Yevhenii,
The intrinsic you have used is vdupq_x_n_f32. The "_x" means this is the "Don't care" predication type, where false predicated lanes have an undefined value. In your code, all the predication flags are false (because 5.17 is not less than zero), so all the lanes in your result vector have "don't care" values (which just happens to be 12.69 in this case).
If you want to preserve the values of false predicated lanes, you should use the "_m" or "merging" predication type, vdupq_m_n_f32. This will use the values from an inactive vector for the false predicated lanes, and in your example we can just use the original vector TTT_f4 for the inactive vector.
So, this code keeps all lanes at 5.17 (because 5.17 is not less than zero), taking the false predicated lanes from the original vector TTT_f4:
mve_pred16_t p_0;float32x4_t TTT_f4;TTT_f4 = vdupq_n_f32(5.17);p_0 = vcmpltq_n_f32(TTT_f4, 0.0);TTT_f4 = vdupq_m_n_f32(TTT_f4, 12.69, p_0);
If we now change the test value to 10 so that all lanes predicate to true, this code overwrites all values with 12.69 (because 5.17 is less than 10):
mve_pred16_t p_0;float32x4_t TTT_f4;TTT_f4 = vdupq_n_f32(5.17);p_0 = vcmpltq_n_f32(TTT_f4, 10.0);TTT_f4 = vdupq_m_n_f32(TTT_f4, 12.69, p_0);
There is more information about the different predication types in the Helium Programmer's Guide:https://developer.arm.com/documentation/102102/0102/Predication
Hope this helps,Chris
Thanks