This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

random number with mali 400 mp

Note: This was originally posted on 22nd November 2012 at http://forums.arm.com

I am having no success into generating random noise with the MALI 400 MP GPU, the code I'm using is next which works with all the other GPU vendors I've tried.
On this specific GPU will return just a few cross screen dotted thin lines with a wide space between them, instead of the expected random noise.

float rand(vec2 co)
{
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}

void main()
{
vec3 color  = texture2D(u_texture, v_texcoord);
vec3 noise =  vec3( rand(gl_FragCoord.st / 1000.0) );
gl_FragColor = vec4(color + noise, 1.0);
}


Is there something to take into account in this specific GPU hardware?

Thanks
  • Note: This was originally posted on 23rd November 2012 at http://forums.arm.com

    Hi sanchiski,

    Something to take into account regarding the Mali-400 GPU is that we implement floating point in the fragment processor as FP16, which is conformant with the Khronos specification. I would direct you to the forum post here, which contains a detailed discussion of the considerations of FP16 on Mali 400. Also note that on T6xx platforms, we use FP32 for fragment precision.

    I believe in your case, you are attempting to pass very large numbers to the sin and fract functions, which is resulting in a lack of precision required for the operations. You should consider restricting your numbers to a low range (say, no larger than +-3.14 so that you have the maximum amount of bits for precision).

    Please let me know if you have any more questions.

    Thanks,
    Chris
  • Note: This was originally posted on 23rd November 2012 at http://forums.arm.com

    [font=Arial, sans-serif][size=2]> Also note that on T6xx platforms, we use FP32 for fragment precision.[/size][/font]
    [font=Arial, sans-serif][size=2]
    [/size][/font]
    [font=Arial, sans-serif][size=2]One clarification. We do support highp (fp32) _and_ mediump (fp16). For most fragment operations we recommend using mediump; fp16 is "precise enough" for most color-related operations, and uses less memory bandwidth. If you need fp32 for some operations that is still possible of course.[/size][/font]
    [font=Arial, sans-serif][size=2]
    [/size][/font]
    [font=Arial, sans-serif][size=2]HTH,[/size][/font]
    [font=Arial, sans-serif][size=2]Iso[/size][/font]
  • Note: This was originally posted on 24th November 2012 at http://forums.arm.com

    I've tried to limit the values for frac and sin, but I didn't get the expected results.
    Is there an example of how to produce random simple grainy noise with this GPU model?
  • Note: This was originally posted on 27th November 2012 at http://forums.arm.com

    This is the new approach I did to get a simple compact noise function, but yet no results. Any help on what am I doing wrong for this specific GPU model?


    varying vec2 v_texcoord;
    uniform sampler2D u_texture;

    float fractFp16(in float value)
    {
        float exponent = min(ceil(-log2(value + 1.4013e-045) - 1.0), 127.0);
        float scaled = value * pow(2.0, exponent);
        float result = fract(scaled * 65536.0) + ((exponent + 1.0) / 65536.0);
        return result;
    }
    vec3 noise(in float variation, in float intensity)
    {
        float value = sin(fractFp16(variation * 0.180654321));
        value = fractFp16(value);
        return vec3(value) / intensity;
    }
    void main()
    {
        vec3 color = texture2D(u_texture, v_texcoord).rgb; 
        vec3 noise_factor = noise(length(color), 3.0);
        color += noise_factor;
        gl_FragColor = vec4(color, 1.0);
    }
  • Note: This was originally posted on 1st December 2012 at http://forums.arm.com

    A was still unable to find a way to create a small and compact grain-noise function for this specific GPU model. And as mentioned in your answer seems the problem is the FP16 limit and the sin and fract functions.
    What would be the way to use FP32 in this specific GPU model? I've tried the "precision highp float;" at the start of the shader but it didn't have any effect.

    I've also tried to avoid using sin and fract all together, but I find that with the mod function I also face the same problem. I think it comes from multiplying the variable variation  times 1000.0,
    I find that with small numbers due to the FP16 limit there is no way to find any proper grain-noise function for the MALI-400-MP GPU.


    varying vec2 v_texcoord;
    uniform sampler2D u_texture;

    vec3 grainNoise(in vec3 color, in float intensity)
    {
    float variation = v_texcoord.x * v_texcoord.y * 1000.0;
    variation = mod(variation, 13.0) * mod(variation, 123.0);
    float grain = mod(variation, 0.01);
    vec3 result = color + color * clamp(0.1 + grain * (intensity * 100.0), 0.0, 1.0);
    return result;
    }
    void main()
    {
    vec3 color = texture2D(u_texture, v_texcoord).rgb;
    color = grainNoise(color, 1.0);
    gl_FragColor = vec4(color, 1.0);
    }


    thanks
  • Note: This was originally posted on 4th December 2012 at http://forums.arm.com

    Hi Sanchiski,

    This is a tricky problem to solve in FP16 and we're looking into some options. A few notes however:

    In your first example you use gl_FragCoord as the parameter to your rand function. In such cases, given a value of gl_FragCoord, the result of rand is determinable, and is invariant across frames. In such cases it is probably wiser to either render to a texture or precompute this on the CPU and upload as a texture, and sample this in your fragment shader, saving yourself cycles each frame.

    For the other methods where texture coordinates are used, it might still be worth precomputing this, as you know the range and so a sufficiently high res texture sampled in the correct way will yield the same results. We are looking into some options on how to compute this on the fly, but as the result is invariant on the parameter passed in, this is probably a more correct method.

    Thanks,
    Chris