SIMD help for exemple

hterrolle 3 months ago

hi,

i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.

So i decide once again to ask question about how to use SIMD with exemple.

i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.

the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.

ONE:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min        - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max        - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min                  - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
            int diff8 = std::abs((*in1)[x].max        - (*in1)[y].min);

and

TWO :

and

FOUR:

int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);

and how to do

if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){

and

if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){

and

if ( (diff1 < 9 || diff2 < 9) && (diff3 < 9 || diff4 < 9) ){

i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))

Thanks a lot in advence.

PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2

Top replies

Parents

0 hterrolle 2 months ago in reply to Peter Harris

hi,

I think i need once again your compétence in SIMD to add a test on my doublon function.

I need to add an IF.

for (int x = 0 ; x < (*indnbObj) ; x++){

    for (int y = (x+1) ; y < (*indnbObj) ; y++){

        if ((*in1)[x].A == (*in1)[y].A && (*in1)[x].B == (*in1)[y].B && (*in1)[x].C == (*in1)[y].C && (*in1)[x].D == (*in1)[y].D){

          // netoyage des doublons aux extrémités
            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
          int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);

i plan to rewrite part of your code like this

for (int x = 0 ; x < rect_count; x++)
{

    for (int y = (x + 1) ; y < rect_count; y++)
    {

        int* x_base2 = &(in1[x].A);
        int32x4_t xv2 = vld1q_s32(x_base2);
        int* y_base2 = &(in1[y].A);
        int32x4_t yv2 = vld1q_s32(y_base2);

       And here i should Use branches selects rather than conditional. But i do not know how to do.

       int* x_base = &(in1[x].raw_col_min);
       int32x4_t xv = vld1q_s32(x_base);
       int* y_base = &(in1[y].raw_col_min);
       int32x4_t yv = vld1q_s32(y_base);

if you could explain me how to do it would be nice. ;))

PS: if i do the x_base inside the second loop. does it change something. Or should i keep x_base and x_base2 inside the first loop.

thanks in advance ;))
Cancel
Vote up 0 Vote down

Reply

Accept answer

Reject answer

Cancel

Reply

0 hterrolle 2 months ago in reply to Peter Harris

hi,

I think i need once again your compétence in SIMD to add a test on my doublon function.

I need to add an IF.

for (int x = 0 ; x < (*indnbObj) ; x++){

    for (int y = (x+1) ; y < (*indnbObj) ; y++){

        if ((*in1)[x].A == (*in1)[y].A && (*in1)[x].B == (*in1)[y].B && (*in1)[x].C == (*in1)[y].C && (*in1)[x].D == (*in1)[y].D){

          // netoyage des doublons aux extrémités
            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
          int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);

i plan to rewrite part of your code like this

for (int x = 0 ; x < rect_count; x++)
{

    for (int y = (x + 1) ; y < rect_count; y++)
    {

        int* x_base2 = &(in1[x].A);
        int32x4_t xv2 = vld1q_s32(x_base2);
        int* y_base2 = &(in1[y].A);
        int32x4_t yv2 = vld1q_s32(y_base2);

       And here i should Use branches selects rather than conditional. But i do not know how to do.

       int* x_base = &(in1[x].raw_col_min);
       int32x4_t xv = vld1q_s32(x_base);
       int* y_base = &(in1[y].raw_col_min);
       int32x4_t yv = vld1q_s32(y_base);

if you could explain me how to do it would be nice. ;))

PS: if i do the x_base inside the second loop. does it change something. Or should i keep x_base and x_base2 inside the first loop.

thanks in advance ;))
Cancel
Vote up 0 Vote down

Reply

Accept answer

Reject answer

Cancel

Children

0 hterrolle 2 months ago in reply to hterrolle

i think i should use

int* x_base2 = &(in1[x].A);
int32x4_t xv2 = vld1q_s32(x_base2);
int* y_base2 = &(in1[y].A);
int32x4_t yv2 = vld1q_s32(y_base2);

uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compare

if (mask2){ // if compare ok
   continu the work;

      // Using SIMD, it is better to put these two line inside
      // the first loop. Data load are done only one time.
          //int* x_base = &(in1[x].raw_col_min); 
          //int32x4_t xv = vld1q_s32(x_base); 
   int* y_base = &(in1[y].raw_col_min);
   int32x4_t yv = vld1q_s32(y_base);

 .............
}

0 hterrolle 1 month ago in reply to hterrolle

hi,

I just implement the modification like this : (not waiting for answer)

    for (int x = 0 ; x < rect_count; x++)
    {

        int* x_base2 = &(in1[x].A);
        int32x4_t xv2 = vld1q_s32(x_base2);

        for (int y = x + 1 ; y < rect_count; y++)
        {

            int* y_base2 = &(in1[y].A);
            int32x4_t yv2 = vld1q_s32(y_base2);

            uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compare
            float32_t all_mask2_4 = vminvq_u32(mask2) != 0;

            if (all_mask2_4 == 1){ // if compare ok

but i was surprised that i could not use bool as the result of vminvq_u32(mask2) != 0 like in the original exemple if i use vceqq_s32 rather than vcltq_s32 .

the problem was the "if (mask2)" that said it is not a bool

I do not anderstand why ?
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel
0 Peter Harris 1 month ago in reply to hterrolle

hterrolle said:
float32_t all_mask2_4 = vminvq_u32(mask2) != 0;

This should be a bool result, not a float32_t result. The rest looks OK though as far as I can tell.
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel
0 hterrolle 1 month ago in reply to Peter Harris

yes you are rigth. I did a mistake using : if (mask2)

this is much better ;))

            bool all_mask2_4 = vminvq_u32(mask2) != 0;

            if (all_mask2_4){ // if compare ok

thanks.
Cancel
Vote up +1 Vote down

Reply

Accept answer

Cancel
0 hterrolle 1 month ago in reply to hterrolle

hi,

Sorry to come back. but i got another question.

when i do the diff

// Compute diff
int32x4_t diff1_4 = vabsq_s32(vsubq_s32(A, B));

I got 4 result. One for each test (A1,B1)(A2,B2)(A3,B3) and (A4,B4)

And than i have to do the comparaison

uint32x4_t mask1_4 = vcltq_s32(diff1_4, X);

So in "uint32x4_t mask1_4" i got the comparaison for each test, so 4 résult.

And i would like to check

if (mask1_4[0] > 0 && mask1_4[1] > 0) and if (mask1_4[2] > 0 && mask1_4[3] > 0)

If it is possible ! how to do this ?

thanks again. ;))
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel
0 Peter Harris 1 month ago in reply to hterrolle
If you want "any of the 4 lanes" then do something like this:

if (vmaxvq_u32(mask1_4) > 0) { ... }

If you only want to match two lanes out of the four, then I would "vandq_u32()" the mask to zero out the mask lanes you don't want before doing the vmaxq_u32().

The other option is to reduce the mask to a 4-bit bitmask you can then test with normal C bit-wise arithmetic. Example of how to do this here:

https://github.com/ARM-software/astc-encoder/blob/701503966b1ac2ebd2616cba94adee5ae8ba6363/Source/astcenc_vecmathlib_neon_4.h#L410

P.S. In future, please raise new questions as a new forum post - it's easier to track questions and answers that way.

Cheers,
Pete
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel