This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Streamline's MaliCoreCyclesExecutionCoreActive calculation mechanism

Hi, all

I have a test case.

The first case is first drawing opaque objects, then adding a copy color pass and a copy depth pass, and then continuing the render transparent objects.

The second case is I remove the intermediate two copy pass, that is the drawing transparent objects follows the drawing opaque objects.

In my expectation. The first test case takes more load/store external bandwidth and more pixels and fragments (the two copy pass) than the second one.

The streamline's report is under expectation for bandwidths,pixels,and fragments.However,the MaliCoreCyclesExecutionCoreActive counter seems a little bit misunderstanding.

The first case looks to take fewer cycles than the second one which is unexpected.

As you can see,the first part's(first test case) MaliCoreCyclesExecutionCoreActive is 270M cycles and the seconde part(seconde test case)'s is 287M

I check all the sub counters, and all the left counters added up are more than the second part(here is under my expectations)

but the MaliCoreCyclesExecutionCoreActive counter cycles are less than the second part(here is out of my expectations), why?

and if you add all the sub counters together does not equal the MaliCoreCyclesExecutionCoreActives.

I wonder if there are some hidden counters missing in the streamline reports.

Thank you.

Parents
  • If you add all the sub counters together does not equal the MaliCoreCyclesExecutionCoreActives.

    That's now how the counters work - the unit counters don't sum up to the cycle count total as most units run in parallel. To do anything useful here you'll need to zoom in more down to the 1ms level so you can actually see more detail per frame. But on average your arithmetic utilization is very low, which makes me thing you're picking up stalls from somewhere (hard to tell where from your screenshot). 

    If you're able to share and exported Streamline capture with us I'd be happy to take a look (mobilestudio@arm.com).

    Cheers, 
    Pete

Reply
  • If you add all the sub counters together does not equal the MaliCoreCyclesExecutionCoreActives.

    That's now how the counters work - the unit counters don't sum up to the cycle count total as most units run in parallel. To do anything useful here you'll need to zoom in more down to the 1ms level so you can actually see more detail per frame. But on average your arithmetic utilization is very low, which makes me thing you're picking up stalls from somewhere (hard to tell where from your screenshot). 

    If you're able to share and exported Streamline capture with us I'd be happy to take a look (mobilestudio@arm.com).

    Cheers, 
    Pete

Children