This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali400, lose 15ms if using a FBO

Note: This was originally posted on 29th July 2013 at http://forums.arm.com

Hello,

I am developing a game on a Galaxy S3 with a Mali 400 GPU

I noticed that as soon as I start to render my scene to a FBO, I lose 10 to 15ms in frame time when compared to directly rendering to screen....
I need a FBO in order to do some post processing, (so I want to use the content of the FBO as a source texture for a shader)

Here is what I do :
=> I use a 565 FBO, with a color attachment (I tried using or not a depth attachment, this changes nothing, depth precision is 16bits)
=> at each frame, I bind the FBO, then immediately clear it (color & depth)
=> then I render my 3D scene inside this fbo
=> after that I unbind the FBO, then draw a quad on screen with a ultra simple shader which task is only displaying the color attachment on screen, nothing else ...
=> then I draw my game's UI
=> finally I call eglSwapBuffers

So comparing FBO / not FBO there is a 10 to 15ms performance hit and this "hit" is most of the time coming from the eglSwapBuffer duration

Is that normal ? Any idea on how I could improve that ?

PS: What is even stranger is that if I start to use some more complex post processing code in the post processing shader, then I do not get much more performance hit ...

Thanks

Peter Harris over 11 years ago

Note: This was originally posted on 29th July 2013 at http://forums.arm.com

Can you share an APK or at least a cut down code example which reproduces the problem?
Cancel
Vote up 0 Vote down

Cancel
Victor Bernot over 11 years ago

Note: This was originally posted on 1st August 2013 at http://forums.arm.com

Can you share an APK or at least a cut down code example which reproduces the problem?

Hi,

I cannot share the apk, neither a sample code since this is a very complex and non released game ...

I can give you some numbers, maybe that can help :

=> Mali 400MP on a Galaxy S3

=> complex vertex (hw skinning or per vertex computations for normal maps) and pixel shaders (most of them sample 3 to 4 textures)

=> 180 to 240 draw calls each frame

=> 180k to 240k triangles per frame

=> up to 6 FBOs are used each frame:
=> one 256x256 for a shadow map
=> two 256x256 for blurring the shadow map
=> one 512x512 for a first reflexion effect
=> another 512x512 for another reflexion effect
=> a 1024x2048 for post process

=> the engine takes care of grouping the objects to minimize state changes and "cache" the openGL state changes to minimize bandwidth

=> all FBOs and screen are in 565, depth 16

=> VBOs for vertices / indices are used for everything, most of them are static, but some are updated at each frame (particule systems / UI etc.)

=> this engines work very well on other GPUs

=> engine uses 2 extra threads for sound and physics (and the physics thread may be "very active" sometimes ...)

In some extreme tests I replaced 90% of the pixel shaders by "only draw a color" shaders, and removed the usage of all FBOs except the one for post process (for which I used a simple "only copy the pixel" shader) and the problem persisted (so I lose 10 to 15ms by using a FBO) ... note that at the reverse when the app is very slow because there are too much things on screen, the loss from the use of a FBO is a lot less important, sometimes zero ..

Anyway ....

I found a solution yesterday ... that seems very strange to me, but it seems to work ...

I read in the mali docs that up to 4 previous frames (or something like this) may be still "in process" while my app is sending draw commands for the current frame (by the way I think this point is not described enough in the docs I found on the website ...)

So now here is my solution :
=> I just double buffered the post effect FBO
=> so I have 2 1024x480 FBOs A and B
=> at frame N I render into A, then blit it on screen, at frame N+1 I render into B then blit it on screen
=> and bam ... instead of losing 10 to 15ms now I lose only around 5 to 10ms ...

If you have any idea of why this works ... I am interested !

Thanks
Cancel
Vote up 0 Vote down

Cancel