hi,
i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.
So i decide once again to ask question about how to use SIMD with exemple.
i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.
the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.
ONE:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);
and
TWO :
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
FOUR:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
and how to do
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){
if ( (diff1 < 9 || diff2 < 9) && (diff3 < 9 || diff4 < 9) ){
i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))
Thanks a lot in advence.
PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2
Sorry typo - fixed.
I think i need once again your compétence in SIMD to add a test on my doublon function.
I need to add an IF.
for (int x = 0 ; x < (*indnbObj) ; x++){
for (int y = (x+1) ; y < (*indnbObj) ; y++){
if ((*in1)[x].A == (*in1)[y].A && (*in1)[x].B == (*in1)[y].B && (*in1)[x].C == (*in1)[y].C && (*in1)[x].D == (*in1)[y].D){
// netoyage des doublons aux extrémités int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);
i plan to rewrite part of your code like this
for (int x = 0 ; x < rect_count; x++) {
for (int y = (x + 1) ; y < rect_count; y++) {
int* x_base2 = &(in1[x].A); int32x4_t xv2 = vld1q_s32(x_base2); int* y_base2 = &(in1[y].A); int32x4_t yv2 = vld1q_s32(y_base2);
And here i should Use branches selects rather than conditional. But i do not know how to do.
int* x_base = &(in1[x].raw_col_min); int32x4_t xv = vld1q_s32(x_base); int* y_base = &(in1[y].raw_col_min); int32x4_t yv = vld1q_s32(y_base);
if you could explain me how to do it would be nice. ;))
PS: if i do the x_base inside the second loop. does it change something. Or should i keep x_base and x_base2 inside the first loop.
thanks in advance ;))
i think i should use
uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compareif (mask2){ // if compare ok continu the work;
// Using SIMD, it is better to put these two line inside // the first loop. Data load are done only one time. //int* x_base = &(in1[x].raw_col_min); //int32x4_t xv = vld1q_s32(x_base); int* y_base = &(in1[y].raw_col_min); int32x4_t yv = vld1q_s32(y_base); .............}
I just implement the modification like this : (not waiting for answer)
for (int x = 0 ; x < rect_count; x++) { int* x_base2 = &(in1[x].A); int32x4_t xv2 = vld1q_s32(x_base2); for (int y = x + 1 ; y < rect_count; y++) { int* y_base2 = &(in1[y].A); int32x4_t yv2 = vld1q_s32(y_base2); uint32x4_t mask2 = vceqq_s32(xv2 , yv2); // i do the compare float32_t all_mask2_4 = vminvq_u32(mask2) != 0; if (all_mask2_4 == 1){ // if compare ok
but i was surprised that i could not use bool as the result of vminvq_u32(mask2) != 0 like in the original exemple if i use vceqq_s32 rather than vcltq_s32 .
the problem was the "if (mask2)" that said it is not a bool
if (mask2)" that said it is not a bool
I do not anderstand why ?
hterrolle said:float32_t all_mask2_4 = vminvq_u32(mask2) != 0;
This should be a bool result, not a float32_t result. The rest looks OK though as far as I can tell.
yes you are rigth. I did a mistake using : if (mask2)
this is much better ;))
bool all_mask2_4 = vminvq_u32(mask2) != 0; if (all_mask2_4){ // if compare ok
thanks.
Sorry to come back. but i got another question.
when i do the diff
// Compute diff int32x4_t diff1_4 = vabsq_s32(vsubq_s32(A, B));
I got 4 result. One for each test (A1,B1)(A2,B2)(A3,B3) and (A4,B4)
And than i have to do the comparaison
uint32x4_t mask1_4 = vcltq_s32(diff1_4, X);
So in "uint32x4_t mask1_4" i got the comparaison for each test, so 4 résult.
And i would like to check
if (mask1_4[0] > 0 && mask1_4[1] > 0) and if (mask1_4[2] > 0 && mask1_4[3] > 0)
If it is possible ! how to do this ?
thanks again. ;))
If you want "any of the 4 lanes" then do something like this:
if (vmaxvq_u32(mask1_4) > 0) { ... }
If you only want to match two lanes out of the four, then I would "vandq_u32()" the mask to zero out the mask lanes you don't want before doing the vmaxq_u32().
The other option is to reduce the mask to a 4-bit bitmask you can then test with normal C bit-wise arithmetic. Example of how to do this here:
https://github.com/ARM-software/astc-encoder/blob/701503966b1ac2ebd2616cba94adee5ae8ba6363/Source/astcenc_vecmathlib_neon_4.h#L410
P.S. In future, please raise new questions as a new forum post - it's easier to track questions and answers that way.
Cheers, Pete