This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

shader core cycles estimation

Hello Forum,

I've simple Fragment shader running on Immortalis-G715. The shader has no other computations except an equal number of int (cvt) and fp (fma) computations and that too they are independent. The shader core cycles almost equals to cvt + fma instructions but as per documentations cycles should be max(cvt, fma) instructions since they can run paralley in this case. so my query is does shader core cycles just reports cvt+fma instructions but in reality it should be max(cvt, fma)?

Regards,

Venkatesh.

Parents Reply Children
  • Thanks for the response, Peter. Here I am giving a simple shader case where it has independent int and fp operations and also provided streamline data collected on G715. 

    in vec4 v_nFade, ivec4 v_nFade1;
    
    void main()
    {
       vec4 b = vec4(1.3, 2.1, 3.4, 1.03);
       ivec4 c = ivec4(3, 5, 7, 9);
       for(int i=0; i<64; i++) {
          c = v_nFade1 >> c;
          b = (b * v_nFade);
       }
       color = vec4(c) + b;
    }

    Mali Core Cycles:Any active Mali Core Cycles:Execution core active Mali Core Instructions:CVT instructions Mali Core Instructions:FMA instructions Mali Core Varying Issues:32-bit interpolation slots Mali Core Varying Requests:Interpolation requests Mali Core Warps:Fragment warps
    91748 89129 50257 49451 12320 6160 770

    The streamline CVT & FMA instructions per invocation data matches with shader program. 

    The core cycles data streamline reported close to CVT + FMA instructions but as per my understanding it should be max(CVT, FMA) as they are independent and they can execute parallel. Could you clarify it? Thanks.

  • The pipes are not totally independent in G715. I don't believe the exact behaviour is publicly documented.

  • The pipes are not totally independent in G715. I don't believe the exact behaviour is publicly documented.

    I'd be curious what this looks like if you change the integer shift for an add (although the compiler might just pre-multiply because it's integer code - I need to check the disassembly to be sure).