Query on CVT perf counter on G720 and G725

Hello Forum,

I’ve noticed that the CVT pipeline counter is active for FP16 floating-point operations on Mali G725, but not on Mali G720.

Example:

precision mediump float;

in vec4 in1, in2;

out vec4 col;

main( ) { col = in1 * in2; }

cvt fma sfu narrow warps
G720 0 8320 0 8320 16324
G725 8247 8249 0 16473 16448

The shader performs 4 FP16 operations (vec4 multiply). On G720, only the FMA unit is used, which makes sense because FP16 operations on Mali are typically executed as vec2 SIMD, so this results in two instructions per thread.

However, on G725, the CVT unit is also used, and its count matches the FMA count. Why does G725 require CVT instructions for this case?

Thanks,

Venkatesh.

Parents
  • However, on G725, the CVT unit is also used, and its count matches the FMA count. Why does G725 require CVT instructions for this case?

    Internal microarchitecture isn't publicly documented, so not sure I can give you a useful answer.

    That said, I can't reproduce this on the latest offline compiler, so it's possibly just a code gen inefficiency in some early drivers. What driver is your device using?

    Kind regards, 
    Pete

Reply
  • However, on G725, the CVT unit is also used, and its count matches the FMA count. Why does G725 require CVT instructions for this case?

    Internal microarchitecture isn't publicly documented, so not sure I can give you a useful answer.

    That said, I can't reproduce this on the latest offline compiler, so it's possibly just a code gen inefficiency in some early drivers. What driver is your device using?

    Kind regards, 
    Pete

Children
No data