mali_offline_compiler question

1、I found a strange problem, I tested the following two kernels, the first kernel shows in picture one is shorter than the second kernel shows in picture two.Test platform is Mali -T864.GlobalWorkSize=10000000(10M),The first takes 15ms and the second takes 20ms.

                                      pic 1

                                     pic 2

2、I use mali_offline_compiler to profile them,the two are same shows in pic 3,how to get Instructions Emmited and  Path Cycles?Why Instructions Emmited is twice than Longest Path Cycles ?And in my opinion, the L/S operation should be 3 times,Why four times here?

More questions in this forum