hi,
when i do the diff
// Compute diff int32x4_t diff1_4 = vabsq_s32(vsubq_s32(A, B));
I got 4 result. One for each test (A1,B1)(A2,B2)(A3,B3) and (A4,B4)
And than i have to do the comparaison
uint32x4_t mask1_4 = vcltq_s32(diff1_4, X);
So in "uint32x4_t mask1_4" i got the comparaison for each test, so 4 résult.
the answer on post "SIMD help for exemple" was to use
if (vmaxvq_u32(mask1_4) > 0) { ... }
I thinks my sentence was confuse in the previous post. I wrote
" if (mask1_4[0] > 0 && mask1_4[1] > 0) and if (mask1_4[2] > 0 && mask1_4[3] > 0) "
but it is not if( mask1_4[0] > 0 && mask1_4[1] > 0 && mask1_4[2] > 0 && mask1_4[3] > 0) )
i need to do 2 test
if (mask1_4[0] > 0 && mask1_4[1] > 0){
process data1
}
if (mask1_4[2] > 0 && mask1_4[3] > 0){
process data2
I think that vmaxvq_u32(mask1_4) will check all the comparaison. like
if( mask1_4[0] > 0 && mask1_4[1] > 0 && mask1_4[2] > 0 && mask1_4[3] > 0 )
PS: i think i should have post it in the old post
i spend 3 hours debuging NEON code until now.
but "vmaxvq_u32(mask1_4_01 ) " does not compile "error: no matching function for call to 'vmaxvq_u32"
it look like vmaxvq_u32 is for 128bits,"uint32x4_t", long and not 64bits,"uint32x2_t". I tried to find the correct function but without result.
i tried to use vmin_u32 and vmax_u32 but it look like it is a comparaison of 2 uint32x2_t. Not usefull in my context.
I need an vmaxvq_u32 or vminvq_u32 how work with uint32x2_t.
But i can use "if ( mask1_4[0] > 0 && mask1_4[1] > 0) because "uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);" return -1 when the comparaison is OK and 0 when it is not.
Something wrong with vmaxvq_u32( uint32x2_t) .
PS: concerning the extraction. high is not [0][1] but [2][3]
why if ( mask1_4[0] > 0 && mask1_4[1] > 0) work with a -1 value. It does look at the sign ?
same thing if i use vcltq_u32.
by the way if ( (mask1_4[0]+mask1_4[1]) > 1 ) does not work in all the case.
in fact i tested namy solution.
lets said a have
int32x4_t diff1_4 = vabsq_s32(vsubq_s32(xv, yv));
uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);
mask1_4 -1 -1 -1 -1 if i add (mask1_4[0]+mask1_4[1]) i got -2
but if i check
if ( (mask1_4[0]+mask1_4[1]) < 0) it is false
if ( (mask1_4[0]+mask1_4[1]) == -2) it istrue
if ( (mask1_4[0]+mask1_4[1]) < -1) it istrue
if ( (mask1_4[0]+mask1_4[1]) > 0) it istrue
there is a problem ! < 0 should be true and > 0 false.
here is the complete code :
__attribute__((noinline)) void neon_multi(){
int A1[4] = {300,400,600,400}; int B1[4] = {300,400,600,400};//= {,600,400,300,400}; int32x4_t constseuil = vdupq_n_s32(4);
int* x_base = (A1); int32x4_t xv = vld1q_s32(x_base); int* y_base = (B1); int32x4_t yv = vld1q_s32(y_base); // Compute diff int32x4_t diff1_4 = vabsq_s32(vsubq_s32(xv, yv)); // Compute diff int32x4_t yv_swap = vextq_u32(yv, yv, 2); int32x4_t diff5_8 = vabsq_s32(vsubq_s32(xv, yv_swap)); // Generate diff conditions uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil); LOGE(" neon_multi mask1_4 %3d %3d %3d %3d \n",mask1_4[0],mask1_4[1],mask1_4[2],mask1_4[3]); LOGE(" neon_multi test %3d %3d \n",(mask1_4[0]+mask1_4[1]),(mask1_4[2]+mask1_4[3])); uint32x4_t mask5_8 = vcltq_s32(diff5_8, constseuil); LOGE(" neon_multi mask5_8 %3d %3d %3d %3d \n",mask5_8[0],mask5_8[1],mask5_8[2],mask5_8[3]); LOGE(" neon_multi test %3d %3d \n",(mask5_8[0]+mask5_8[1]),(mask5_8[2]+mask5_8[3])); if ( (mask1_4[0]+mask1_4[1]) > 0){LOGE(" neon_multi 1 %3d\n",(mask1_4[0]+mask1_4[1]));}
if ( (mask1_4[0]+mask1_4[1]) < 0){LOGE(" neon_multi 2 %3d\n",(mask1_4[0]+mask1_4[1]));}
if ( (mask1_4[0]+mask1_4[1]) == -2){LOGE(" neon_multi 3 %3d\n",(mask1_4[0]+mask1_4[1]));}
if ( (mask1_4[0]+mask1_4[1]) < -1){LOGE(" neon_multi 4 %3d\n",(mask1_4[0]+mask1_4[1]));}
let's see tomorow ;))
hterrolle said:it look like vmaxvq_u32 is for 128bits,"uint32x4_t", long and not 64bits,"uint32x2_t". I tried to find the correct function but without result.
hterrolle said:I need an vmaxvq_u32 or vminvq_u32 how work with uint32x2_t.
NEON intrinsics are relatively regular in terms of naming convention, the "q" versions are 128-bit quad-word versions, the non-q versions are 64-bit double work versions.
So vmaxvq_u32() is the 128-bit version, and vmaxv_u32() is the 64-bit version.
hterrolle said:why if ( mask1_4[0] > 0 && mask1_4[1] > 0) work with a -1 value. It does look at the sign ?
mask1_4 is an unsigned uint32 type, so 4294967295 not -1. NEON compares just return a vector with all bits set if true, or all bits clear for false, which is useful for use in logical bitwise mask operations.
i check it pomorow.
and thanks again. I spend 6 hours on neon today. I am full. ;))