Graphics, Gaming, and VR forum Shader ALU improvement not having expected impact

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 7 replies
Subscribers 137 subscribers
Views 23564 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Shader ALU improvement not having expected impact

JPJ over 4 years ago

Hi,

We've analysed our main shader (which presumably accounts for most of the pixels from the 3D pass). The shader is largely ALU bound in most architectures (see trimmed malioc's report below):

Before optimization (Mali G-71):

                               A     LS      V      T

Total instruction cycles:    6.8    0.0    4.0    2.0

After optimization:

                               A     LS      V      T

Total instruction cycles:    4.7    0.0    4.8    2.0

This optimization was driven by the fact that we were convinced, perhaps wrongly, that the Shader Core Unit was ALU bound (image attached).

(LEFT: after optimization; RIGHT: before optimization)

After the optimization being applied though, we didn't notice any significant improvement in ALU (both in the total span of a frame but also within the region I show above, which I believe to correspond to the 3D pass): ~69% from ~70%.
My suspicion is that this might be related with the Partial Coverage Rate values - according to your blog, this could be due to sliver/micro triangles. The execution core utilization drops significantly midway and I can't flag any other culprit. So, if we're really eroding the performance due to that kind of geometry, would that explain the ineffective optimization?

Cheers!

Top replies

Peter Harris over 4 years ago in reply to JPJ +2 verified

I'll check internally about the apk but I suspect it's tricky :/ Yeah, understood - it normally is unfortunately. If an APK isn't possible, I'd still be happy to look over the Streamline data and/or...

Parents

+1 Peter Harris over 4 years ago in reply to JPJ

I'll check internally about the apk but I suspect it's tricky :/

Yeah, understood - it normally is unfortunately. If an APK isn't possible, I'd still be happy to look over the Streamline data and/or a Graphics Analyzer frame capture if those are easier to share.

Assuming that I can't is there any more Streamline/malioc info I can add here?

Nothing from malioc - what you see is what you get. The main weakness in Streamline is the "guess" you have to make to line up what you see in the data with what's happening at the API level. Reviewing the same frame in Streamline and Graphics Analyzer can sometimes help as it gives a bit more API context to line things up.

I assume you're referring to whole program optimizations when you refer to program pipeline am I correct?

Yes, exactly that. Either eliminating unused parts of the vertex shader completely, changing precision based on interface mismatches, or moving computation between stages.

HTH,
Pete
Cancel
Up +2 Down

Cancel

Reply

+1 Peter Harris over 4 years ago in reply to JPJ

I'll check internally about the apk but I suspect it's tricky :/

Yeah, understood - it normally is unfortunately. If an APK isn't possible, I'd still be happy to look over the Streamline data and/or a Graphics Analyzer frame capture if those are easier to share.

Assuming that I can't is there any more Streamline/malioc info I can add here?

Nothing from malioc - what you see is what you get. The main weakness in Streamline is the "guess" you have to make to line up what you see in the data with what's happening at the API level. Reviewing the same frame in Streamline and Graphics Analyzer can sometimes help as it gives a bit more API context to line things up.

I assume you're referring to whole program optimizations when you refer to program pipeline am I correct?

Yes, exactly that. Either eliminating unused parts of the vertex shader completely, changing precision based on interface mismatches, or moving computation between stages.

HTH,
Pete
Cancel
Up +2 Down

Cancel

Children

0 JPJ over 4 years ago in reply to Peter Harris

Cheers! I'm asking around about apk/streamline data sharing policies. On the meantime I'll try to do some more exhaustive tests. Thanks for your help Pete!
Cancel
Up 0 Down

Cancel
0 JPJ over 4 years ago in reply to JPJ

Hi Pete, I'm looking into the possibility of sending you more data (waiting some answers from legal, etc..).

On the meantime, I'm just wondering if the actual 3D pass might be occurring in the area where the orange rectangle is? I've been associating the 3D pass with the area where the highest concentration of compressed/mipmapped textures are because that seems to be the case from a frame inspection. But I can't understand why such a high usage of varying and texturing is active. Or, if that could just be an upscaling pass towards the end of the frame, after which there's less than a handful of small UI items popping up. Apologies for the generic questions, and once more, quite thankful for your help!
Cancel
Up 0 Down

Cancel
0 Peter Harris over 4 years ago in reply to JPJ

I'm just wondering if the actual 3D pass might be occurring in the area where the orange rectangle is?

I prefer your original hypothesis (the part with the compressed and mipmapped texture accesses is the 3D pass). The workload in the orange box area is just too consistent over time to be a 3D pass, that looks more like a post-process pass to me.
Cancel
Up 0 Down

Cancel
0 JPJ over 4 years ago in reply to Peter Harris

Cheers! That tip will probably help me narrow things down. To add a bit more verbosity to the original question, I clearly oversimplified the static analysis case. While there is some base/common shading code that contributes to the majority of the visible pixels, there's a whole bunch of shader variants (mostly fragment) that originate from that. I've painfully compared half a dozen of them and there was always an ALU gain proportional to what I state above - in some cases slightly higher.
Anyway, I'm probably extending this discussion a bit too far. If I conclude anything that might help give some completeness to my original question I'll add here. Thanks for your help!
Cancel
Up 0 Down

Cancel