Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Distinguished Ambassadors
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Graphics, Gaming, and VR forum
Mali400, lose 15ms if using a FBO
Jump...
Cancel
Locked
Locked
Replies
2 replies
Subscribers
136 subscribers
Views
3002 views
Users
0 members are here
Mali-GPU
Mali-400
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Mali400, lose 15ms if using a FBO
Victor Bernot
over 11 years ago
Note: This was originally posted on 29th July 2013 at
http://forums.arm.com
Hello,
I am developing a game on a Galaxy S3 with a Mali 400 GPU
I noticed that as soon as I start to render my scene to a FBO, I lose 10 to 15ms in frame time when compared to directly rendering to screen....
I need a FBO in order to do some post processing, (so I want to use the content of the FBO as a source texture for a shader)
Here is what I do :
=> I use a 565 FBO, with a color attachment (I tried using or not a depth attachment, this changes nothing, depth precision is 16bits)
=> at each frame, I bind the FBO, then immediately clear it (color & depth)
=> then I render my 3D scene inside this fbo
=> after that I unbind the FBO, then draw a quad on screen with a ultra simple shader which task is only displaying the color attachment on screen, nothing else ...
=> then I draw my game's UI
=> finally I call eglSwapBuffers
So comparing FBO / not FBO there is a 10 to 15ms performance hit and this "hit" is most of the time coming from the eglSwapBuffer duration
Is that normal ? Any idea on how I could improve that ?
PS: What is even stranger is that if I start to use some more complex post processing code in the post processing shader, then I do not get much more performance hit ...
Thanks
Parents
Victor Bernot
over 11 years ago
Note: This was originally posted on 1st August 2013 at
http://forums.arm.com
Can you share an APK or at least a cut down code example which reproduces the problem?
Hi,
I cannot share the apk, neither a sample code since this is a very complex and non released game ...
I can give you some numbers, maybe that can help :
=> Mali 400MP on a Galaxy S3
=> complex vertex (hw skinning or per vertex computations for normal maps) and pixel shaders (most of them sample 3 to 4 textures)
=> 180 to 240 draw calls each frame
=> 180k to 240k triangles per frame
=> up to 6 FBOs are used each frame:
=> one 256x256 for a shadow map
=> two 256x256 for blurring the shadow map
=> one 512x512 for a first reflexion effect
=> another 512x512 for another reflexion effect
=> a 1024x2048 for post process
=> the engine takes care of grouping the objects to minimize state changes and "cache" the openGL state changes to minimize bandwidth
=> all FBOs and screen are in 565, depth 16
=> VBOs for vertices / indices are used for everything, most of them are static, but some are updated at each frame (particule systems / UI etc.)
=> this engines work very well on other GPUs
=> engine uses 2 extra threads for sound and physics (and the physics thread may be "very active" sometimes ...)
In some extreme tests I replaced 90% of the pixel shaders by "only draw a color" shaders, and removed the usage of all FBOs except the one for post process (for which I used a simple "only copy the pixel" shader) and the problem persisted (so I lose 10 to 15ms by using a FBO) ... note that at the reverse when the app is very slow because there are too much things on screen, the loss from the use of a FBO is a lot less important, sometimes zero ..
Anyway ....
I found a solution yesterday ... that seems very strange to me, but it seems to work ...
I read in the mali docs that up to 4 previous frames (or something like this) may be still "in process" while my app is sending draw commands for the current frame (by the way I think this point is not described enough in the docs I found on the website ...)
So now here is my solution :
=> I just double buffered the post effect FBO
=> so I have 2 1024x480 FBOs A and B
=> at frame N I render into A, then blit it on screen, at frame N+1 I render into B then blit it on screen
=> and bam ... instead of losing 10 to 15ms now I lose only around 5 to 10ms ...
If you have any idea of why this works ... I am interested !
Thanks
Cancel
Up
0
Down
Cancel
Reply
Victor Bernot
over 11 years ago
Note: This was originally posted on 1st August 2013 at
http://forums.arm.com
Can you share an APK or at least a cut down code example which reproduces the problem?
Hi,
I cannot share the apk, neither a sample code since this is a very complex and non released game ...
I can give you some numbers, maybe that can help :
=> Mali 400MP on a Galaxy S3
=> complex vertex (hw skinning or per vertex computations for normal maps) and pixel shaders (most of them sample 3 to 4 textures)
=> 180 to 240 draw calls each frame
=> 180k to 240k triangles per frame
=> up to 6 FBOs are used each frame:
=> one 256x256 for a shadow map
=> two 256x256 for blurring the shadow map
=> one 512x512 for a first reflexion effect
=> another 512x512 for another reflexion effect
=> a 1024x2048 for post process
=> the engine takes care of grouping the objects to minimize state changes and "cache" the openGL state changes to minimize bandwidth
=> all FBOs and screen are in 565, depth 16
=> VBOs for vertices / indices are used for everything, most of them are static, but some are updated at each frame (particule systems / UI etc.)
=> this engines work very well on other GPUs
=> engine uses 2 extra threads for sound and physics (and the physics thread may be "very active" sometimes ...)
In some extreme tests I replaced 90% of the pixel shaders by "only draw a color" shaders, and removed the usage of all FBOs except the one for post process (for which I used a simple "only copy the pixel" shader) and the problem persisted (so I lose 10 to 15ms by using a FBO) ... note that at the reverse when the app is very slow because there are too much things on screen, the loss from the use of a FBO is a lot less important, sometimes zero ..
Anyway ....
I found a solution yesterday ... that seems very strange to me, but it seems to work ...
I read in the mali docs that up to 4 previous frames (or something like this) may be still "in process" while my app is sending draw commands for the current frame (by the way I think this point is not described enough in the docs I found on the website ...)
So now here is my solution :
=> I just double buffered the post effect FBO
=> so I have 2 1024x480 FBOs A and B
=> at frame N I render into A, then blit it on screen, at frame N+1 I render into B then blit it on screen
=> and bam ... instead of losing 10 to 15ms now I lose only around 5 to 10ms ...
If you have any idea of why this works ... I am interested !
Thanks
Cancel
Up
0
Down
Cancel
Children
No data