SIMD help for exemple

hi,

i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.

So i decide once again to ask question about how to use SIMD with exemple.

i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.

the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.

           ONE:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min                  - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
            int diff8 = std::abs((*in1)[x].max                  - (*in1)[y].min);

and

           TWO :

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);

and

           FOUR:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);

and how to do

           if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){

and

          if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){

and

         if ( (diff1 < 9 || diff2 < 9)  &&  (diff3 < 9 || diff4 < 9) ){

i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))

Thanks a lot in advence.

PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2

Parents
  • yes of course,

    here is one exemple that i use inside pthread. The function call is. "doublonBis(out,indrect,draw1,2);"

    // même couleur
    static void doublonBis(const void* __restrict__  a,int * __restrict__ indnbObj, int draw1, int num){

        struct my_rectangle (*in1)[5000] = (my_rectangle (* __restrict__ )[5000])a;
        int match = 0;

        /////////////////////////////////////////////////////////////////////////////////////////////////
        //             N E T O Y A G E   D E S   D O U B L O N   A U X   E X T R E M I T E             //
        /////////////////////////////////////////////////////////////////////////////////////////////////
        for (int x = 0 ; x < (*indnbObj) ; x++){

            for (int y = (x+1) ; y < (*indnbObj) ; y++){

                // netoyage des doublons aux extrémités
                int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
                int diff2 = std::abs((*in1)[x].min         - (*in1)[y].min);
                int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
                int diff4 = std::abs((*in1)[x].max         - (*in1)[y].max);
                int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
                int diff6 = std::abs((*in1)[x].min         - (*in1)[y].max);
                int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
                int diff8 = std::abs((*in1)[x].max         - (*in1)[y].min);

                if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
                    if ((*in1)[x].rupture == num){
                        (*in1)[y].rupture = num;
                    }else if ( (*in1)[x].Y_depart == 0){
                            (*in1)[y].rupture = num;
                        }else{
                            (*in1)[x].rupture = num;
                        }
                    match++;
                }
            }
        }
        LOGE(" %d doublon %d \n",draw1,match);
    }

    it took 8 ms to process 4 time 2800 struct. In this case it is a 0.5 x^2 loop. But i also use X^2 loop quite a lot.

    By the way i use -03 for compilation. But i did not had a look at the assembler produced by clang android-ndk-r27c. I have not done it for the last 35 years. May be i should. ;))

Reply
  • yes of course,

    here is one exemple that i use inside pthread. The function call is. "doublonBis(out,indrect,draw1,2);"

    // même couleur
    static void doublonBis(const void* __restrict__  a,int * __restrict__ indnbObj, int draw1, int num){

        struct my_rectangle (*in1)[5000] = (my_rectangle (* __restrict__ )[5000])a;
        int match = 0;

        /////////////////////////////////////////////////////////////////////////////////////////////////
        //             N E T O Y A G E   D E S   D O U B L O N   A U X   E X T R E M I T E             //
        /////////////////////////////////////////////////////////////////////////////////////////////////
        for (int x = 0 ; x < (*indnbObj) ; x++){

            for (int y = (x+1) ; y < (*indnbObj) ; y++){

                // netoyage des doublons aux extrémités
                int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
                int diff2 = std::abs((*in1)[x].min         - (*in1)[y].min);
                int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
                int diff4 = std::abs((*in1)[x].max         - (*in1)[y].max);
                int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
                int diff6 = std::abs((*in1)[x].min         - (*in1)[y].max);
                int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
                int diff8 = std::abs((*in1)[x].max         - (*in1)[y].min);

                if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
                    if ((*in1)[x].rupture == num){
                        (*in1)[y].rupture = num;
                    }else if ( (*in1)[x].Y_depart == 0){
                            (*in1)[y].rupture = num;
                        }else{
                            (*in1)[x].rupture = num;
                        }
                    match++;
                }
            }
        }
        LOGE(" %d doublon %d \n",draw1,match);
    }

    it took 8 ms to process 4 time 2800 struct. In this case it is a 0.5 x^2 loop. But i also use X^2 loop quite a lot.

    By the way i use -03 for compilation. But i did not had a look at the assembler produced by clang android-ndk-r27c. I have not done it for the last 35 years. May be i should. ;))

Children