Hi guys,
May I know more details about 'read stall cycles' and 'write stall cycles' in Mali External Bus Stalls chart in streamline cature?
For my games, I always have write stall cycles significantly larger than read stall cycles.
Can someone tell me more about the Mali External Bus Stalls chart? Even if it's not a profiling issue, I'm curious as an intellectual curiosity :)
Thanks!
Hi Jinho,
These counters report the number of cycles where the GPU is getting back-pressure from the memory bus (i.e. GPU has data and is trying to write, but the down-stream memory bus says "not ready").
Write stalls are common for mid and large-sized GPU configurations rendering render passes with low content complexity. In this scenario you will have many shader cores trying to write completed pixels concurrently and the down-stream memory system can't keep up with the rate that the shader cores can complete pixels.
Kind regards, Pete
Hi Pete,
Thanks for your reply. Your answers are always helpful to me as I am new to Arm based profiling :)
Actually I have two additional questions.
First, in my game scenes, there are usually several different objects with large sized textures. So I'm pretty sure there will definitely be some texture memory related load. But I'm not sure which Streamline counters to look at to check and verify this. Again, I would like to know how to check the load on the gpu to read a large amount of textures from memory.I'm using a Mali-G78 based device right now, and the counters described in the Sreamline online documentation seem to be slightly different.
Second, using the performance advisor, I was able to find a section with severe frame drop and energy consumption through the Pixels Per Frames chart and GPU bandwidth Per Frame. In addition, I actually thought overdraw would be a problem, but the actual change in overdraw did not significantly affect the change in other counter values.
The overdraw per pixel chart in the performance advisor is a bit odd. Since my app is a game, it is a bit strange that most of the sections are smaller than 1 despite rendering a lot of backgrounds and objects by default. Isn't it normal to always exceed 1?
Anyway I know overdraw is the main cause of gpu bound and my game render many layered 2d backgrounds and many sprite objects If overdraw is improved overall for the scene, is it possible to change the threshold at which frame drop occurs or reduce the width? For example, the frame drop period mentioned above is narrow, or the depth of the falling frame is relatively shallow.
Could you give me some advice on this?
Jinho Mang said:I'm using a Mali-G78 based device right now, and the counters described in the Sreamline online documentation seem to be slightly different.
Check you have the Mali-G78 template applied. It's this menu in the top right of the Timeline view.
The default view is just alphabetical and doesn't include any derivations. We have an item on the backlog to apply the template automatically, but currently it's a manual step, sorry. The templated counters should match the online documentation - if they don't that's a bug so please let us know =)
Jinho Mang said:I would like to know how to check the load on the gpu to read a large amount of textures from memory.
In Streamline, with the Mali-G78 template applied, you can see texture bandwidth from L2 cache in the "Mali Core L2 Memory Reads" chart, and external memory in the "Mali Core External Memory Reads" chart. These specific counters are *per core* numbers, so scale by $MaliConstantsShaderCoreCount if you want a GPU-wide total.
Jinho Mang said:The overdraw per pixel chart in the performance advisor is a bit odd. Since my app is a game, it is a bit strange that most of the sections are smaller than 1 despite rendering a lot of backgrounds and objects by default. Isn't it normal to always exceed 1?
Yes, this does seem odd, but without seeing the data it's hard to be sure what's happening. If you're able to export a Streamline capture and share it, feel free to get in touch at mobilestudio@arm.com and I'm happy to take a look.
Jinho Mang said:, is it possible to change the threshold at which frame drop occurs or reduce the width? For example, the frame drop period mentioned above is narrow, or the depth of the falling frame is relatively shallow.
I'm not entirely sure what the question is here, sorry.
You'll get dropped frames if you have a single frame (if double buffered) or two consecutive frames (if triple buffered) which are below your target refresh rate. The only way to avoid that is to optimize and optimize some more. There will always be some variation between devices due to thermals and power management that are hard to control, so generally do the best you can.
For reporting purposes (e.g. slow frame capture in Performance Advisor) you can set the threshold below which slow frames are captured. Note that Performance Advisor uses a sliding window to average FPS over multiple frames, as triple buffering makes a mess of CPU-side frame timing, so very short transient FPS drops may not get detected as slow frames.
HTH,Pete
Thanks to your answer, I solved a lot of issues! The overdraw value of the PA report is still incomprehensible. However, the overdraw value could be checked through the 'Fragments/pixel' value, which is the thread unit of the Mali G78 template.
-Jinho
The overdraw value of the PA report is still incomprehensible
I'll investigate that one here. If you are able to share an export of your Streamline capture it will help us reproduce exactly the issue you are seeing. Feel free to get in touch at mobilestudio@arm.com.
Thanks, Pete