SIMD help for exemple

hi,

i decided to have a look at SIMD intrinsics instructions but there is a lt of documentation but i cannot find exemple.

So i decide once again to ask question about how to use SIMD with exemple.

i need only 2 exemple. Than i think a should be able to mixte practique et knowledge.

the first axemple is how to do when (*in1) are INT array . the traitment is inside this append in loop (*in1)[x] - (*in1)[y], the intrincis should be VSUB if i read correctky and VABS. But i need the syntaxe code.

           ONE:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);
            int diff5 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_max);
            int diff6 = std::abs((*in1)[x].min                  - (*in1)[y].max);
            int diff7 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_min);
            int diff8 = std::abs((*in1)[x].max                  - (*in1)[y].min);

and

           TWO :

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);
            int diff3 = std::abs((*in1)[x].raw_col_max - (*in1)[y].raw_col_max);
            int diff4 = std::abs((*in1)[x].max                  - (*in1)[y].max);

and

           FOUR:

            int diff1 = std::abs((*in1)[x].raw_col_min - (*in1)[y].raw_col_min);
            int diff2 = std::abs((*in1)[x].min                  - (*in1)[y].min);

and how to do

           if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) || (diff5 < 5 && diff6 < 5 && diff7 < 5 && diff8 < 5) ){

and

          if ( (diff1 < 9 && diff2 < 9 && diff3 < 9 && diff4 < 9) ){

and

         if ( (diff1 < 9 || diff2 < 9)  &&  (diff3 < 9 || diff4 < 9) ){

i think that would be enough. Than i should be able to find my way. Or i will come back to you. ;))

Thanks a lot in advence.

PS: i work with médiatek 9200+ and Mali-G715-Immortalis MC11 r1p2

Parents
  • So, SIMD is very good if we do not use conditional check ?

    There are two separate issues. 

    The first issue is just about what percentage of the code can be vectorized. If you have 8 scalar instructions that turn into 2 vector instructions then you go 4x faster. If you have 8 scalar instructions that turn into 1 vector instruction and 4 scalar instructions then you only go 1.6x faster. It doesn't take much serial code to reduce the amount of uplift you get, so good SIMD-friendly algorithms try and stay in SIMD and avoid scalar code as much as possible.

    The second issue around use of conditional selects is about branches and branch prediction. Modern CPUs run deep pipelines with a lot of instructions in flight, and rely on branch prediction. For a lot of data-driven algorithms branches that branch off data can be hard to predict, because there isn't a regular pattern in the data. For these unpredictable branches, using conditional selects may often actually need to execute more instructions but avoids the misprediction overhead so runs faster in practice.

    everything using NEON instructions clang should do the work ?

    It's a question of effort vs reward. If you really care about optimizing then hand written code can often beat the compiler (especially if you know that your problem allows you to make assumptions the compiler cannot do in a generic way). However, it's more expensive to develop and maintain, and you might be happy with auto-vectorized code because it's lower long-term cost and more portable across different architectures.

    HTH, 
    Pete

Reply
  • So, SIMD is very good if we do not use conditional check ?

    There are two separate issues. 

    The first issue is just about what percentage of the code can be vectorized. If you have 8 scalar instructions that turn into 2 vector instructions then you go 4x faster. If you have 8 scalar instructions that turn into 1 vector instruction and 4 scalar instructions then you only go 1.6x faster. It doesn't take much serial code to reduce the amount of uplift you get, so good SIMD-friendly algorithms try and stay in SIMD and avoid scalar code as much as possible.

    The second issue around use of conditional selects is about branches and branch prediction. Modern CPUs run deep pipelines with a lot of instructions in flight, and rely on branch prediction. For a lot of data-driven algorithms branches that branch off data can be hard to predict, because there isn't a regular pattern in the data. For these unpredictable branches, using conditional selects may often actually need to execute more instructions but avoids the misprediction overhead so runs faster in practice.

    everything using NEON instructions clang should do the work ?

    It's a question of effort vs reward. If you really care about optimizing then hand written code can often beat the compiler (especially if you know that your problem allows you to make assumptions the compiler cannot do in a generic way). However, it's more expensive to develop and maintain, and you might be happy with auto-vectorized code because it's lower long-term cost and more portable across different architectures.

    HTH, 
    Pete

Children