Hi Peter Harris,
I have read your blog on Mali GPU Hardware Counters. I have a few questions.
The Mali Job Manager Cycles:GPU cycles counter gives the total amount of cycles, the GPU was active. If I execute a compute workload (not graphics), I should be able to predict the execution time of the kernel should be from Tripipe cycles counter. There is always a differnce in value between the Mali Job Manager Cycles:GPU cycles and Mali Core Cycles:Tripipe cycles. What does this extra cycles signify. I know that the values reported by streamline is average value across all the shader cores but still what does this extra cycles signify?
I also would like to know what exactly does the Mali Core Cycles:Compute cycles and Mali Compute Threads:Compute cycles awaiting descriptors counters report ??.
This is because I ran a OpenCL benchamark with zero arithmetic instructions but still the values of Mali Core Cycles:Compute cycles and Mali Compute Threads:Compute cycles awaiting descriptors are not zero while Mali Compute Threads:Compute tasks and Mali Compute Threads:Compute threads started were zero.
Also the tripipe cycles counter value should be equal to the maximum of cycles spent in Arithmetic/LS-pipeline/Texture pipeline but even when there are no texture and Arithmetic instructions, the value of Mali Core Cycles:Tripipe cycles is not the same as Mali Load/Store Pipe:LS instruction issues counter. Why this is happening? If I am executing only memory instructions, Mali Core Cycles:Tripipe cycles should be equal to Mali Load/Store Pipe cycles instead I see that Mali Core Cycles:Compute cycles , Mali Compute Threads:Compute cycles awaiting descriptors and Mali Core Cycles:Tripipe cycles have similar values??
It would be helpful if you can give some insights to these behaviours?
P.S. I am doing an academic project and i am modeling the performance of opencl kernel on Mali GPUs.
P.P.S.I am not an android developer looking at optimizations
No - the beats count is the number of bus data beat cycles. A single transaction is normally multiple data beats (e.g. 64 byte transactions with 16 byte bus = 4 beats per transaction).
Thanks.
So do you have any suggestions of getting L2 misses in the absence of L2 read lookup counters?
Also does Mali L2 Cache Ext Reads:External bus stalls (AR) + Mali L2 Cache Ext Writes:External bus stalls (W) give the total number of stall cycles due to external memory request?
There are definitely should be L2 read and write lookup counters available for Mali-T62x.
https://github.com/ARM-software/gator/blob/master/daemon/events-Mali-T62x_hw.xml
What do you get in your counter selection list in Streamline?
Stall counter definitions are here (e.g.):
https://community.arm.com/graphics/b/blog/posts/mali-midgard-family-performance-counters#jive_content_id_534_L2_EXT_AR_STALL
Cheers, Pete
These are the L2 counters that are visible in my streamline selection
Mali L2 Cache Ext Reads:External bus stalls (AR)
Mali L2 Cache Ext Reads:External write bytes
Mali L2 Cache Ext Writes:External bus stalls (W)
Mali L2 Cache Ext Writes:External read bytes
Mali L2 Cache Reads:L2 read hits
Mali L2 Cache Reads:Read snoops
Mali L2 Cache Writes:L2 write hits
Mali L2 Cache Writes:Write snoops
Hi Peter Harris
I just realised that the streamline is showing the events available in events-Mali-Midgard_hw.xml and not in events-Mali-T62x_hw.xml.
Can you please let me know, how can change gator to use events-Mali-T62x_hw.xml. and not events-Mali-Midgard_hw.xml