Support forums

Mobile, Graphics, and Gaming forum varying vs computation performance in fragment shader

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 1 reply
Subscribers 138 subscribers
Views 5984 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

varying vs computation performance in fragment shader

Shawn Chang over 5 years ago

I want to know the cost of varying or the cost of calculation, which is greater？

Take the following example as an example:
A*(1-factor), the factor is calculated in the vertex shader, and the factor will be passed to the fragment shader as varying. In order to achieve the same effect, there are the following two solutions:
1. A is the uniform of the vertex shader, A* (1-factor) is calculated in the vertex shader, and the result is passed to the fragment shader as a varying, and fragment shader uses the result directly - in this case, the main overhead should be varying interpolation.
2. A is the uniform of the fragment shader, A* (1-factor) is calculated in the fragment shader, and then fragment shader directly uses the result of the calculation - in this case, the main cost should be the calculation cost in fragment shader.

For the above two solutions, which performance is better? Also, where can I find the varying interpolation efficiency data of arm's GPU? For example: How many floats can by interpolated in a cycle?

// solution 1:
uniform float A;
varying float result;
varying float factor;
void vs()
{
factor = ...;
result = A * (1-factor);
}

void fs()
{
 // directly uses result to other computations
}

// solution 2:
varying float factor;
void vs()
{
factor = ...;

}

uniform float A;
void fs()
{
// ...
result = A * (1-factor);
// ...
}

Top replies

Peter Harris over 5 years ago +2 verified

Hi Shawn, For Mali most uniform loads are effectively "free" (they get promoted into registers), the what you have here is a fairly straight trade-off between bandwidth (number of varyings written) and...