SIMD help for exemple

hi,

i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.

So i decide once again to ask question about how to use SIMD with exemple.

i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.

the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.

           ONE:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min                  - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
            int diff8 = std::abs((*in1)[x].max                  - (*in1)[y].min);

and

           TWO :

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);

and

           FOUR:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);

and how to do

           if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){

and

          if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){

and

         if ( (diff1 < 9 || diff2 < 9)  &&  (diff3 < 9 || diff4 < 9) ){

i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))

Thanks a lot in advence.

PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2

Parents Reply Children
  • hi,

    I think i need once again your compétence in SIMD to add a test on my doublon function.

    I need to add an IF.

    for (int x = 0 ; x < (*indnbObj) ; x++){

        for (int y = (x+1) ; y < (*indnbObj) ; y++){

            if ((*in1)[x].A ==  (*in1)[y].A && (*in1)[x].B == (*in1)[y].B && (*in1)[x].C == (*in1)[y].C && (*in1)[x].D == (*in1)[y].D){

                // netoyage des doublons aux extrémités
                int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
                int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
                int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
                int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
                int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
                int diff6 = std::abs((*in1)[x].min - (*in1)[y].max);
                int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
                int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);

    i plan to rewrite part of your code like this

    for (int x = 0 ; x < rect_count; x++)
    {

        for (int y = (x + 1) ; y < rect_count; y++)
        {

            int* x_base2 = &(in1[x].A);
            int32x4_t xv2 = vld1q_s32(x_base2);
            int* y_base2 = &(in1[y].A);
            int32x4_t yv2 = vld1q_s32(y_base2);

           And here i should  Use branches selects rather than conditional. But i do not know how to do.

           int* x_base = &(in1[x].raw_col_min);
           int32x4_t xv = vld1q_s32(x_base);
           int* y_base = &(in1[y].raw_col_min);
           int32x4_t yv = vld1q_s32(y_base);

    if you could explain me how to do it would be nice. ;))

    PS: if i do the x_base inside the second loop. does it change something. Or should i keep x_base and x_base2 inside the first loop.

    thanks in advance ;))

  • i think i should use

      int* x_base2 = &(in1[x].A);
     int32x4_t xv2 = vld1q_s32(x_base2);
     int* y_base2 = &(in1[y].A);
     int32x4_t yv2 = vld1q_s32(y_base2);

    uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compare

    if (mask2){ // if compare ok
    continu the work;
          // Using SIMD, it is better to put these two line inside
    // the first loop. Data load are done only one time.
        //int* x_base = &(in1[x].raw_col_min);
       //int32x4_t xv = vld1q_s32(x_base);
       int* y_base = &(in1[y].raw_col_min);
       int32x4_t yv = vld1q_s32(y_base);

    .............
    }


  • hi,

    I just implement the modification like this : (not waiting for answer)

        for (int x = 0 ; x < rect_count; x++)
        {

            int* x_base2 = &(in1[x].A);
            int32x4_t xv2 = vld1q_s32(x_base2);

            for (int y = x + 1 ; y < rect_count; y++)
            {

                int* y_base2 = &(in1[y].A);
                int32x4_t yv2 = vld1q_s32(y_base2);

                uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compare
                float32_t all_mask2_4 = vminvq_u32(mask2) != 0;
                
                if (all_mask2_4 == 1){ // if compare ok

    but i was surprised that i could not use bool as the result of vminvq_u32(mask2) != 0 like in the original exemple if i use vceqq_s32 rather than vcltq_s32 .

    the problem was the "if (mask2)" that said it is not a bool

    I do not anderstand why ?

  • float32_t all_mask2_4 = vminvq_u32(mask2) != 0;

    This should be a bool result, not a float32_t result. The rest looks OK though as far as I can tell.

  • yes you are rigth. I did a mistake using :  if (mask2)

    this is much better ;))

                bool all_mask2_4 = vminvq_u32(mask2) != 0;
                
                if (all_mask2_4){ // if compare ok

    thanks.

  • hi,

    Sorry to come back. but i got another question.

    when i do the diff

    // Compute diff
    int32x4_t diff1_4 = vabsq_s32(vsubq_s32(A, B));

    I got 4 result. One for each test (A1,B1)(A2,B2)(A3,B3) and (A4,B4)

    And than i have to do the comparaison

    uint32x4_t mask1_4 = vcltq_s32(diff1_4, X);

    So in "uint32x4_t mask1_4" i got the comparaison for each test, so 4 résult.

    And i would like to check

       if (mask1_4[0] > 0 && mask1_4[1] > 0)  and if (mask1_4[2] > 0 && mask1_4[3] > 0) 

    If it is possible ! how to do this ?

    thanks again. ;))

  • If you want "any of the 4 lanes" then do something like this:

    if (vmaxvq_u32(mask1_4) > 0) { ... }

    If you only want to match two lanes out of the four, then I would "vandq_u32()" the mask to zero out the mask lanes you don't want before doing the vmaxq_u32().  

    The other option is to reduce the mask to a 4-bit bitmask you can then test with normal C bit-wise arithmetic. Example of how to do this here:

    https://github.com/ARM-software/astc-encoder/blob/701503966b1ac2ebd2616cba94adee5ae8ba6363/Source/astcenc_vecmathlib_neon_4.h#L410

    P.S. In future, please raise new questions as a new forum post - it's easier to track questions and answers that way.

    Cheers, 
    Pete