This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali400, lose 15ms if using a FBO

Note: This was originally posted on 29th July 2013 at http://forums.arm.com

Hello,

I am developing a game on a Galaxy S3 with a Mali 400 GPU

I noticed that as soon as I start to render my scene to a FBO, I lose 10 to 15ms in frame time when compared to directly rendering to screen....
I need a FBO in order to do some post processing, (so I want to use the content of the FBO as a source texture for a shader)

Here is what I do :
=> I use a 565 FBO, with a color attachment (I tried using or not a depth attachment, this changes nothing, depth precision is 16bits)
=> at each frame, I bind the FBO, then immediately clear it (color & depth)
=> then I render my 3D scene inside this fbo
=> after that I unbind the FBO, then draw a quad on screen with a ultra simple shader which task is only displaying the color attachment on screen, nothing else ...
=> then I draw my game's UI
=> finally I call eglSwapBuffers

So comparing FBO / not FBO there is a 10 to 15ms performance hit and this "hit" is most of the time coming from the eglSwapBuffer duration

Is that normal ? Any idea on how I could improve that ?

PS: What is even stranger is that if I start to use some more complex post processing code in the post processing shader, then I do not get much more performance hit ...

Thanks
  • Note: This was originally posted on 29th July 2013 at http://forums.arm.com

    Can you share an APK or at least a cut down code example which reproduces the problem?
  • Note: This was originally posted on 1st August 2013 at http://forums.arm.com


    Can you share an APK or at least a cut down code example which reproduces the problem?


    Hi,

    I cannot share the apk, neither a sample code since this is a very complex and non released game ...

    I can give you some numbers, maybe that can help :

    => Mali 400MP on a Galaxy S3

    => complex vertex (hw skinning or per vertex computations for normal maps) and pixel shaders (most of them sample 3 to 4 textures)

    => 180 to 240 draw calls each frame

    => 180k to 240k triangles per frame

    => up to 6 FBOs are used each frame:
      => one 256x256 for a shadow map
      => two 256x256 for blurring the shadow map
      => one 512x512 for a first reflexion effect
      => another 512x512 for another reflexion effect
      => a 1024x2048 for post process

    => the engine takes care of grouping the objects to minimize state changes and "cache" the openGL state changes to minimize bandwidth

    => all FBOs and screen are in 565, depth 16

    => VBOs for vertices / indices are used for everything, most of them are static, but some are updated at each frame (particule systems / UI etc.)

    => this engines work very well on other GPUs

    => engine uses 2 extra threads for sound and physics (and the physics thread may be "very active" sometimes ...)

    In some extreme tests I replaced 90% of the pixel shaders by "only draw a color" shaders, and removed the usage of all FBOs except the one for post process (for which I used a simple "only copy the pixel" shader) and the problem persisted (so I lose 10 to 15ms by using a FBO) ... note that at the reverse when the app is very slow because there are too much things on screen, the loss from the use of a FBO is a lot less important, sometimes zero ..

    Anyway ....

    I found a solution yesterday ... that seems very strange to me, but it seems to work ...

    I read in the mali docs that up to 4 previous frames (or something like this) may be still "in process" while my app is sending draw commands for the current frame (by the way I think this point is not described enough in the docs I found on the website ...)

    So now here is my solution :
    => I just double buffered the post effect FBO
    => so I have 2 1024x480 FBOs A and B
    => at frame N I render into A, then blit it on screen, at frame N+1 I render into B then blit it on screen
    => and bam ... instead of losing 10 to 15ms now I lose only around 5 to 10ms ...

    If you have any idea of why this works ... I am interested !

    Thanks