hi,
i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.
So i decide once again to ask question about how to use SIMD with exemple.
i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.
the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.
ONE:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);
and
TWO :
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
FOUR:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
and how to do
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){
if ( (diff1 < 9 || diff2 < 9) && (diff3 < 9 || diff4 < 9) ){
i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))
Thanks a lot in advence.
PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2
yes of course,
here is one exemple that i use inside pthread. The function call is. "doublonBis(out,indrect,draw1,2);"
// même couleurstatic void doublonBis(const void* __restrict__ a,int * __restrict__ indnbObj, int draw1, int num){ struct my_rectangle (*in1)[5000] = (my_rectangle (* __restrict__ )[5000])a; int match = 0; ///////////////////////////////////////////////////////////////////////////////////////////////// // N E T O Y A G E D E S D O U B L O N A U X E X T R E M I T E // ///////////////////////////////////////////////////////////////////////////////////////////////// for (int x = 0 ; x < (*indnbObj) ; x++){ for (int y = (x+1) ; y < (*indnbObj) ; y++){ // netoyage des doublons aux extrémités int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min); if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){ if ((*in1)[x].rupture == num){ (*in1)[y].rupture = num; }else if ( (*in1)[x].Y_depart == 0){ (*in1)[y].rupture = num; }else{ (*in1)[x].rupture = num; } match++; } } } LOGE(" %d doublon %d \n",draw1,match);}
it took 8 ms to process 4 time 2800 struct. In this case it is a 0.5 x^2 loop. But i also use X^2 loop quite a lot.
By the way i use -03 for compilation. But i did not had a look at the assembler produced by clang android-ndk-r27c. I have not done it for the last 35 years. May be i should. ;))
What's the definition of "struct my_rectangle"?
... and is the memory layout of this input struct array something you can change if you can get better performance? I.e. structure-of-arrays could be faster and easier to vectorize than array-of-structures.