hi,
i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.
So i decide once again to ask question about how to use SIMD with exemple.
i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.
the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.
ONE:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);
and
TWO :
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
FOUR:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
and how to do
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){
if ( (diff1 < 9 || diff2 < 9) && (diff3 < 9 || diff4 < 9) ){
i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))
Thanks a lot in advence.
PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2
The only limits are likely to be down to frequency scaling and thermal management - if you run a lot of threads on a lot of cores it can get hot, so frequencies get throttled if it overheats to give the device a chance to cool down
i would resume it by, the number of instructions multipled by the number of data to process.
So, the problem of heating is the bootleneck for mobile and laptop but also for desktop.
I do not know if it is feasible. But space between chip layer to drive heat outside should be possible. this would make the chip a little thicker but cooling system make it also thicker. It is just an idea. May be not the best one. ;))
I forgot to ask a question concerning the optimization using SIMD.
I use a lot of INT array in my project. And if i anderstoud SIMD should work with 128 bytes long on the médiatek 9200+
So, if i change The INT Array to USHORT or UCHAR in certain case would i get better performance ?
Yes, SIMD operations allow more to be done at once on smaller datatypes, so you should be able to improve performance with them in many cases (as long as they still have the needed accuracy)
So, in case i had to load 8 short rather than 4 int. But if i load only 4 short there is no interest ? If i anderstoud how it work ;))
Correct. Loading shorts might be slightly faster because of reduced cache pressure if you are memory-bound, but computationally 4 int16 vs 4 int32 won't make any difference because you just leave half the vector width unused.
do you min that int 64 got the same size as int 32 ? oups.
Sorry typo - fixed.