Graphics, Gaming, and VR forum why vectorized code is slower?

State Accepted Answer
Locked Locked
Replies 3 replies
Subscribers 136 subscribers
Views 4999 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

why vectorized code is slower?

전세원 over 9 years ago

I am trying to make my image processing program faster.

So, i changed my scalar code into vectorized code.

for understanding, the purpose of program is" read right(4) and left(4) total 8 pixel of target pixel from input buffer

pixel1	pixel2	pixel3	pixel4	target	pixel 5	pixel6	pixel 7	pixel8

then compare them with target pixel and calculate weight, then write result into other buffer.

so, i coded in this style

Read 16 pixel(Read_in) , then 8 pixel of this will be 8 target pixel(center).

After that, split pixels into 4(letf)&4(right) . and store them into vector variables.

float8 splited1=(float8)(Readin.s0123,Readin.s5678)

float8 splited2=(float8)(Readin.s1234,Readin.s6789) and so on...

then compare splited1~n with center by using vector operators and calculate weight.

Fianally, store result data(float8) into buffer.

In mali optimization guide, vectorized code is faster than scalar code.

but, in my case, vectorized code is slower than scalar code about 3 times.

why this thing is happened? is it caused by too many elements?

My device is samssung galaxy s6 equipped with mali t760-mp8

Parents

0 Hanni Lozano over 9 years ago

First of all, I am not familiar with Mali so my comments are more on the general aspects of vector vs. scalar code performance. My observation from the description that you provided are:
1. In order to get the most benefit from vector processing your data need to be stored in memory in vector format otherwise the overhead from packing/unpacking data (vector <--> scalar) might eliminate any benefit from using vector processing. Your pseudo code implies that you are loading 16 pixels individually from memory then you pack them into vector variables which results in unnecessary overhead. You need to check if you can store the 16 pixels contiguously in memory so they can be loaded and stored with a single load/store instruction.
2. Some ALU operations like multiplication/addition/etc. generate wide result and requires additional steps to truncate the value for it to be stored/packed into a "vector" variable. Unless the processor/co-processor supports truncated result storage then your program will end using extra instructions.
The best approach is to look at the assembly code to determine where the overhead is coming from. I also suggest that you post the actual code (C/assembly) in order to get more significant feedback. Good luck.
Cancel
Up 0 Down

Cancel

Reply

0 Hanni Lozano over 9 years ago

First of all, I am not familiar with Mali so my comments are more on the general aspects of vector vs. scalar code performance. My observation from the description that you provided are:
1. In order to get the most benefit from vector processing your data need to be stored in memory in vector format otherwise the overhead from packing/unpacking data (vector <--> scalar) might eliminate any benefit from using vector processing. Your pseudo code implies that you are loading 16 pixels individually from memory then you pack them into vector variables which results in unnecessary overhead. You need to check if you can store the 16 pixels contiguously in memory so they can be loaded and stored with a single load/store instruction.
2. Some ALU operations like multiplication/addition/etc. generate wide result and requires additional steps to truncate the value for it to be stored/packed into a "vector" variable. Unless the processor/co-processor supports truncated result storage then your program will end using extra instructions.
The best approach is to look at the assembly code to determine where the overhead is coming from. I also suggest that you post the actual code (C/assembly) in order to get more significant feedback. Good luck.
Cancel
Up 0 Down

Cancel

Children

No data