This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Shader ALU improvement not having expected impact

Hi,

We've analysed our main shader (which presumably accounts for most of the pixels from the 3D pass). The shader is largely ALU bound in most architectures (see trimmed malioc's report below):

Before optimization (Mali G-71):

                               A     LS      V      T
Total instruction cycles:    6.8    0.0    4.0    2.0

After optimization: 

                               A     LS      V      T
Total instruction cycles:    4.7    0.0    4.8    2.0  


This optimization was driven by the fact that we were convinced, perhaps wrongly, that the Shader Core Unit was ALU bound (image attached).

(LEFT: after optimization; RIGHT: before optimization)

After the optimization being applied though, we didn't notice any significant improvement in ALU (both in the total span of a frame but also within the region I show above, which I believe to correspond to the 3D pass): ~69% from ~70%. 
My suspicion is that this might be related with the Partial Coverage Rate values - according to your blog, this could be due to sliver/micro triangles. The execution core utilization drops significantly midway and I can't flag any other culprit. So, if we're really eroding the performance due to that kind of geometry, would that explain the ineffective optimization? 

Cheers!

Parents
  • HI JPJ, 

    Are you able to share a sample APK that is debuggable and/or the data files? Feel free to email developer <at> arm <dot> com, as I assume you can't do this on a public forum.

    The ratios you are seeing in the offline compiler for the pipelines should be pretty close to reality, and you're not really seeing the same thing in the Mali Core Unit Utilization charts so my initial thought is that either this isn't the pass you think it is, or you have the wrong shader. Partial coverage erodes efficiency (you need more warps to cover the same on-screen pixel count), but it shouldn't alter the ratios as the shader is still the same.

    The only caveat that I am aware of with the Mali Offline Compiler is that it does miss some optimizations that the real driver would provide, in particular the offline compiler only compiles single programs whereas the real driver would compile and optimize across all shaders in a program pipeline.

    Kind regards, 
    Pete

Reply
  • HI JPJ, 

    Are you able to share a sample APK that is debuggable and/or the data files? Feel free to email developer <at> arm <dot> com, as I assume you can't do this on a public forum.

    The ratios you are seeing in the offline compiler for the pipelines should be pretty close to reality, and you're not really seeing the same thing in the Mali Core Unit Utilization charts so my initial thought is that either this isn't the pass you think it is, or you have the wrong shader. Partial coverage erodes efficiency (you need more warps to cover the same on-screen pixel count), but it shouldn't alter the ratios as the shader is still the same.

    The only caveat that I am aware of with the Mali Offline Compiler is that it does miss some optimizations that the real driver would provide, in particular the offline compiler only compiles single programs whereas the real driver would compile and optimize across all shaders in a program pipeline.

    Kind regards, 
    Pete

Children