Hello Forum,
Example:
precision mediump float;
in vec4 in1, in2;
out vec4 col;
main( ) { col = in1 * in2; }
The shader performs 4 FP16 operations (vec4 multiply). On G720, only the FMA unit is used, which makes sense because FP16 operations on Mali are typically executed as vec2 SIMD, so this results in two instructions per thread.
However, on G725, the CVT unit is also used, and its count matches the FMA count. Why does G725 require CVT instructions for this case?
Thanks,
Venkatesh.
Venkatesh K R said:However, on G725, the CVT unit is also used, and its count matches the FMA count. Why does G725 require CVT instructions for this case?
Internal microarchitecture isn't publicly documented, so not sure I can give you a useful answer.
That said, I can't reproduce this on the latest offline compiler, so it's possibly just a code gen inefficiency in some early drivers. What driver is your device using?
Kind regards, Pete