hi,
i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.
So i decide once again to ask question about how to use SIMD with exemple.
i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.
the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.
ONE:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max); int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max); int diff6 = std::abs((*in1)[x].min - (*in1)[y].max); int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min); int diff8 = std::abs((*in1)[x].max - (*in1)[y].min);
and
TWO :
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min); int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max); int diff4 = std::abs((*in1)[x].max - (*in1)[y].max);
FOUR:
int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min); int diff2 = std::abs((*in1)[x].min - (*in1)[y].min);
and how to do
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){
if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){
if ( (diff1 < 9 || diff2 < 9) && (diff3 < 9 || diff4 < 9) ){
i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))
Thanks a lot in advence.
PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2
Yes, alway the same trouble of frequency scalling. If i add more instruction in the code, performance drop. There is an amount of instruction that can be procees by unit of time. so if the numer of data to process increase the performance decrease, that is normal. Until 4 time 250 data it goes very quikly. but over 4 time 500 it rapidly slow. So adding a function 0.5x^2 with 2800 data cost a lot, you may reduce the data selection with IF, but that also cost.
There is no other solution than cooling. But i have seen the évolution of ARM for the last 10 years and it is increasing all the time. It is as fast than my desktop from 2011with I7 and 1600 memory frequency. It is not to bad for a mobile.
the best would be to design processor in 3D, it is already like that. But with empty space betwwen every layer. Not easy but not impossible. more space = less heat by unit of space = more frequencies.
That is a phisycal purpose. You cannot resuce the space and reduce the heat at the same time. It does not work if you do not have colling system.
I remenber 32 IBM processor (CICS architecture) in 1990 with azote colling system for very big mainframe and 40ns gravure. I also had a lot of expérience with VAX VMS (RISC architecture with cluster database), the best witch belong to Intel and oracle now.
Heat is the only solution because inceasing frequency increase heat. And decreasing space increase heat.
I am sure you got brillant ingénior, heat would be my focused task.
I got some idea if you like.
Regards,
herve terrolle.