This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Disable watchdog on Mali

Not sure if this is the right place to ask this question but how do I disable watchdog timer on my Note3 which has a Mali-T628? All I could find was dvfs stuff inside /sys/devices/platform/mali.0. Is there something in the user-mode driver I could modify and recompile?

I am trying to run some GLES 3.0 benchmarks and if I increase the number of iterations inside the pixel shader beyond a certain limit, I get a black framebuffer. I figured it must be related to the watchdog timer.

Parents
  • "there should be no need for a single GPU thread to run for 50 million cycles to get a good benchmark of the hardware capability. Having shorter running threads and more of them (more vertices and/or more pixels) would seem a more pragmatic change."

    Yes, and I totally agree with you. But my benchmark is not really a normal bench, it's a microbenchmark to figure out the texture filtering rate. I launch a pixel-shader on a full-screen quad and every thread does texture fetches from L1 cache repeatedly. There is no concept of frames here as I don't care about the image quality and am just rendering into a off-screen buffer.

    The problem I am facing is that if each thread (all threads must do the same amount of work) reads more than 2048 texels then I get black results but not the correct number of 4 bilinear filtered pixels/clock. I am getting close to around 3.6 but not 4 - can only reach 1.66 GTexels/s but GFXBench can reach till 1.9 GTexels/s for my device. I did what you suggested too - have less number of reads per thread but send a lot of batches to the GPU. But then the driver seems to be optimizing all these exactly same drawcalls writing to the same framebuffer. So the only choice I have here is to try out the options you suggested i.e increasing the time on softstop (or hardstop) and the downstream driver. Thanks a lot for your help Peter!

Reply
  • "there should be no need for a single GPU thread to run for 50 million cycles to get a good benchmark of the hardware capability. Having shorter running threads and more of them (more vertices and/or more pixels) would seem a more pragmatic change."

    Yes, and I totally agree with you. But my benchmark is not really a normal bench, it's a microbenchmark to figure out the texture filtering rate. I launch a pixel-shader on a full-screen quad and every thread does texture fetches from L1 cache repeatedly. There is no concept of frames here as I don't care about the image quality and am just rendering into a off-screen buffer.

    The problem I am facing is that if each thread (all threads must do the same amount of work) reads more than 2048 texels then I get black results but not the correct number of 4 bilinear filtered pixels/clock. I am getting close to around 3.6 but not 4 - can only reach 1.66 GTexels/s but GFXBench can reach till 1.9 GTexels/s for my device. I did what you suggested too - have less number of reads per thread but send a lot of batches to the GPU. But then the driver seems to be optimizing all these exactly same drawcalls writing to the same framebuffer. So the only choice I have here is to try out the options you suggested i.e increasing the time on softstop (or hardstop) and the downstream driver. Thanks a lot for your help Peter!

Children
  • The problem I am facing is that if each thread (all threads must do the same amount of work) reads more than 2048 texels then I get black results but not the correct number of 4 bilinear filtered pixels/clock

    What is your shader doing? I suspect the black results are due to a precision problem in your shaders exceeding the maximum representable range of a variable. Are you able to share?

    Most graphics shaders are very short - even high end content like the GFXBench 3.0 Manhattan test typically only uses a handful of texture accesses - so if you have to many unique accesses I wonder if you are hitting some other limit unrelated to the main texturing unit.

    But then the driver seems to be optimizing all these exactly same drawcalls writing to the same framebuffer.

    Multiple opaque drawcalls to the framebuffer won't work - we can kill the overdrawn pixels in hardware - see Killing Pixels - A New Optimization for Shading on ARM Mali GPU.Try turning on blending, as this forces us to keep the overdrawn fragments (we need their color to blend against).

    Cheers,
    Pete

  • Hmm, my shader is fetching from a texture inside a loop and writing out the results once to the framebuffer. All the results are added inside a vec4 variable. I should try using highp instead of lowp/mediump qualifiers then.

  • On Mali mediump is fp16 precision - so the dynamic range is quite small, and if you start using a significant number of bits to represent non-fractional digits you rapidly run out of the fractional part. Try highp - it sounds like it might help.

  • Okay so I tried both suggestions :

    -Use highp for the output color and intermediate variable - the black output is still present on increasing the texel fetches > 2k

    -Enabled blending to prevent pixel killing optimization

    And still no luck :[

    Anyways, It's a good thing you guys report the texel fill rate which is 1 bilinear/clock/unit and 1/2 triliear/clock/unit. And also FP16 is full-rate which I measured.