SIMD Comparaison result and use

hi,

when i do the diff

    // Compute diff
    int32x4_t diff1_4 = vabsq_s32(vsubq_s32(A, B));

I got 4 result. One for each test (A1,B1)(A2,B2)(A3,B3) and (A4,B4)

And than i have to do the comparaison

    uint32x4_t mask1_4 = vcltq_s32(diff1_4, X);

So in "uint32x4_t mask1_4" i got the comparaison for each test, so 4 résult.

the answer on post "SIMD help for exemple" was to use

    if (vmaxvq_u32(mask1_4) > 0) { ... }

I thinks my sentence was confuse in the previous post. I wrote  

   " if (mask1_4[0] > 0 && mask1_4[1] > 0)  and if (mask1_4[2] > 0 && mask1_4[3] > 0) "

but it is not    if(  mask1_4[0] > 0  &&  mask1_4[1] > 0  &&   mask1_4[2] > 0  &&  mask1_4[3] > 0) )

i need to do 2 test

   if (mask1_4[0] > 0 && mask1_4[1] > 0){

        process data1

   }

  if (mask1_4[2] > 0 && mask1_4[3] > 0){

      process data2

 }

I think that vmaxvq_u32(mask1_4) will check all the comparaison. like

    if(  mask1_4[0] > 0 && mask1_4[1] > 0  &&   mask1_4[2] > 0 && mask1_4[3] > 0  )

PS: i think i should have post it in the old post

Parents
  • i spend 3 hours debuging NEON code until now.

    but  "vmaxvq_u32(mask1_4_01 ) " does not compile "error: no matching function for call to 'vmaxvq_u32"

    it look like vmaxvq_u32 is for 128bits,"uint32x4_t", long and not 64bits,"uint32x2_t". I tried to find the correct function but without result.

    i tried to use vmin_u32 and vmax_u32 but it look like it is a comparaison of 2 uint32x2_t. Not usefull in my context.

    I need an vmaxvq_u32 or vminvq_u32 how work with uint32x2_t.

    But i can use "if ( mask1_4[0]  > 0 && mask1_4[1]  > 0) because "uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);" return -1 when the comparaison is OK and 0 when it is not.

    Something wrong with vmaxvq_u32( uint32x2_t) .

    PS: concerning the extraction. high is not  [0][1] but [2][3]

Reply
  • i spend 3 hours debuging NEON code until now.

    but  "vmaxvq_u32(mask1_4_01 ) " does not compile "error: no matching function for call to 'vmaxvq_u32"

    it look like vmaxvq_u32 is for 128bits,"uint32x4_t", long and not 64bits,"uint32x2_t". I tried to find the correct function but without result.

    i tried to use vmin_u32 and vmax_u32 but it look like it is a comparaison of 2 uint32x2_t. Not usefull in my context.

    I need an vmaxvq_u32 or vminvq_u32 how work with uint32x2_t.

    But i can use "if ( mask1_4[0]  > 0 && mask1_4[1]  > 0) because "uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);" return -1 when the comparaison is OK and 0 when it is not.

    Something wrong with vmaxvq_u32( uint32x2_t) .

    PS: concerning the extraction. high is not  [0][1] but [2][3]

Children
  • why if ( mask1_4[0]  > 0 && mask1_4[1]  > 0) work with a -1 value. It does look at the sign ?

    same thing if i use vcltq_u32.

    by the way if (  (mask1_4[0]+mask1_4[1]) > 1 ) does not  work in all the case.

    in fact i tested namy solution.

    lets said a have

    int32x4_t diff1_4  = vabsq_s32(vsubq_s32(xv, yv));

    uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);

    mask1_4    -1  -1  -1  -1    if i add  (mask1_4[0]+mask1_4[1]) i got -2

    but if i check

    if ( (mask1_4[0]+mask1_4[1])  < 0)     it is false

    if ( (mask1_4[0]+mask1_4[1])  == -2) it istrue

    if ( (mask1_4[0]+mask1_4[1])  <  -1) it istrue

    if ( (mask1_4[0]+mask1_4[1])  > 0)   it istrue

    there is a problem !   < 0 should be true and > 0 false.

    here is the complete code :

    __attribute__((noinline)) void neon_multi(){

        int A1[4] = {300,400,600,400};
        int B1[4] = {300,400,600,400};//= {,600,400,300,400};
        
        int32x4_t constseuil = vdupq_n_s32(4);


        int* x_base = (A1);
        int32x4_t xv = vld1q_s32(x_base);

        int* y_base = (B1);
        int32x4_t yv = vld1q_s32(y_base);

        // Compute diff
        int32x4_t diff1_4 = vabsq_s32(vsubq_s32(xv, yv));

        // Compute diff
        int32x4_t yv_swap = vextq_u32(yv, yv, 2);
        int32x4_t diff5_8 = vabsq_s32(vsubq_s32(xv, yv_swap));

        // Generate diff conditions
        uint32x4_t mask1_4 = vcltq_s32(diff1_4, constseuil);
        LOGE(" neon_multi mask1_4 %3d %3d %3d %3d \n",mask1_4[0],mask1_4[1],mask1_4[2],mask1_4[3]);
        LOGE(" neon_multi test %3d %3d \n",(mask1_4[0]+mask1_4[1]),(mask1_4[2]+mask1_4[3]));
        uint32x4_t mask5_8 = vcltq_s32(diff5_8, constseuil);
        LOGE(" neon_multi mask5_8 %3d %3d %3d %3d \n",mask5_8[0],mask5_8[1],mask5_8[2],mask5_8[3]);
        LOGE(" neon_multi test %3d %3d \n",(mask5_8[0]+mask5_8[1]),(mask5_8[2]+mask5_8[3]));
            
        if ( (mask1_4[0]+mask1_4[1])    > 0){LOGE(" neon_multi 1  %3d\n",(mask1_4[0]+mask1_4[1]));}

        if ( (mask1_4[0]+mask1_4[1])    < 0){LOGE(" neon_multi 2  %3d\n",(mask1_4[0]+mask1_4[1]));}

        if ( (mask1_4[0]+mask1_4[1]) == -2){LOGE(" neon_multi 3  %3d\n",(mask1_4[0]+mask1_4[1]));}

        if ( (mask1_4[0]+mask1_4[1])   < -1){LOGE(" neon_multi 4  %3d\n",(mask1_4[0]+mask1_4[1]));}

    }

    let's see tomorow ;))

  • it look like vmaxvq_u32 is for 128bits,"uint32x4_t", long and not 64bits,"uint32x2_t". I tried to find the correct function but without result.

    I need an vmaxvq_u32 or vminvq_u32 how work with uint32x2_t.

    NEON intrinsics are relatively regular in terms of naming convention, the "q" versions are 128-bit quad-word versions, the non-q versions are 64-bit double work versions.

    So vmaxvq_u32() is the 128-bit version, and vmaxv_u32() is the 64-bit version.

    why if ( mask1_4[0]  > 0 && mask1_4[1]  > 0) work with a -1 value. It does look at the sign ?

    mask1_4 is an unsigned uint32 type, so 4294967295 not -1. NEON compares just return a vector with all bits set if true, or all bits clear for false, which is useful for use in logical bitwise mask operations.

  • i check it pomorow.

    and thanks again. I spend 6 hours on neon today. I am full. ;))