Hi, all
I have a test case.
The first case is first drawing opaque objects, then adding a copy color pass and a copy depth pass, and then continuing the render transparent objects.
The second case is I remove the intermediate two copy pass, that is the drawing transparent objects follows the drawing opaque objects.
In my expectation. The first test case takes more load/store external bandwidth and more pixels and fragments (the two copy pass) than the second one.
The streamline's report is under expectation for bandwidths,pixels,and fragments.However,the MaliCoreCyclesExecutionCoreActive counter seems a little bit misunderstanding.
The first case looks to take fewer cycles than the second one which is unexpected.
As you can see,the first part's(first test case) MaliCoreCyclesExecutionCoreActive is 270M cycles and the seconde part(seconde test case)'s is 287M
I check all the sub counters, and all the left counters added up are more than the second part(here is under my expectations)
but the MaliCoreCyclesExecutionCoreActive counter cycles are less than the second part(here is out of my expectations), why?
and if you add all the sub counters together does not equal the MaliCoreCyclesExecutionCoreActives.
I wonder if there are some hidden counters missing in the streamline reports.
Thank you.
If you add all the sub counters together does not equal the MaliCoreCyclesExecutionCoreActives.
That's now how the counters work - the unit counters don't sum up to the cycle count total as most units run in parallel. To do anything useful here you'll need to zoom in more down to the 1ms level so you can actually see more detail per frame. But on average your arithmetic utilization is very low, which makes me thing you're picking up stalls from somewhere (hard to tell where from your screenshot).
If you're able to share and exported Streamline capture with us I'd be happy to take a look (mobilestudio@arm.com).
Cheers, Pete
Thank you for your answer, Peter.
I send the file to mobilestudio@arm.com.
I made the test because I just traced the power consumption of the mobile but just find the 2nd cast takes almost the same Walt per hour as the first one. That is unacceptable since I disable two copy passes which save more bandwidths(Load/Store), pixels, and fragments. I check the streamline and found the 2nd case uses more cycles than the first one, I guess that's the reason they take close power consumption.
Cheers.
Toni
Discussion moved to email, so marking this one as answered.