This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

God Ray effect (light scattering) with Mali GPU

Note: This was originally posted on 10th May 2012 at http://forums.arm.com

Hello everyone,

I'm implementing the God Ray effect for an Android game on Mali-400 based devices (here is Samsung I9100 and Samsung I9300).
I've followed this article http://fabiensanglard.net/lightScattering/index.php
But the effect is not good (as you can see in the attached file).
It did work on Win32 and other Adreno and PowerVR based devices.
I think that there's problem with the texture's coordinate fetch from vertex shader to fragment shader, the interpolation computation may cause this issue.

I hope that you can give us some idea on this.

Thank you.




Parents
  • Note: This was originally posted on 14th May 2012 at http://forums.arm.com

    Hi,

    great, glad the modification works OK - thank you for trying it.

    For the performance issue, let's examine what's feasible on a typical embedded GPU.

    Taking the same example device as above:

    Mali-400 MP4 @ 266MHz
    800x480 WVGA screen @ 30FPS

    This means we have a maximum of 4 * 266*1.0e6 = 1.064e9 fragment cycles available per second. That is a theoretical maximum - perhaps 85% efficiency is more realistic, so 9.044e8 cycles. Also the aim is 30FPS - so divide by 30 becomes ~30M cycles.

    Those 30M cycles will have to render both the main screen and the "light scattering" FBO. Both contain fairly standard looking geometry, but any overdraw (where a fragment may be shaded more than once) must be taken into account. We could allow a factor of 2x overdraw - this will be the same for either the FBO or the main surface, since they are the same geometry. So now there are perhaps 15M cycles available. If the main scene uses a 3 cycle fragment shader, that's 800*480*3 ~= 1M cycles used. That leaves 14M cycles for an FBO. If the FBO was scaled to 1/4 of the main surface, that gives 14M / (400*240) ~= 145 cycles per fragment. The example shader above takes 2 cycles per sample, so approximately 72 samples might be possible to stay within budget.

    Things get harder if it's a larger display, such as 1280x800. In that case, even with a 1/4 area FBO I think the maximum samples would be nearer 23 to still have a chance of hitting 30FPS.

    Have you experimented with different size FBOs and reduced number of samples? Does the performance increase? Is the result visually acceptable? Don't forget to enable bilinear filtering (GL_TEXTURE_MAG_FILTER = GL_LINEAR) when sampling the "light scatter" FBO if it's been scaled down!

    If the performance or visual quality aren't high enough, it may be that another way of achieving a similar effect must be considered.

    HTH, Pete
Reply
  • Note: This was originally posted on 14th May 2012 at http://forums.arm.com

    Hi,

    great, glad the modification works OK - thank you for trying it.

    For the performance issue, let's examine what's feasible on a typical embedded GPU.

    Taking the same example device as above:

    Mali-400 MP4 @ 266MHz
    800x480 WVGA screen @ 30FPS

    This means we have a maximum of 4 * 266*1.0e6 = 1.064e9 fragment cycles available per second. That is a theoretical maximum - perhaps 85% efficiency is more realistic, so 9.044e8 cycles. Also the aim is 30FPS - so divide by 30 becomes ~30M cycles.

    Those 30M cycles will have to render both the main screen and the "light scattering" FBO. Both contain fairly standard looking geometry, but any overdraw (where a fragment may be shaded more than once) must be taken into account. We could allow a factor of 2x overdraw - this will be the same for either the FBO or the main surface, since they are the same geometry. So now there are perhaps 15M cycles available. If the main scene uses a 3 cycle fragment shader, that's 800*480*3 ~= 1M cycles used. That leaves 14M cycles for an FBO. If the FBO was scaled to 1/4 of the main surface, that gives 14M / (400*240) ~= 145 cycles per fragment. The example shader above takes 2 cycles per sample, so approximately 72 samples might be possible to stay within budget.

    Things get harder if it's a larger display, such as 1280x800. In that case, even with a 1/4 area FBO I think the maximum samples would be nearer 23 to still have a chance of hitting 30FPS.

    Have you experimented with different size FBOs and reduced number of samples? Does the performance increase? Is the result visually acceptable? Don't forget to enable bilinear filtering (GL_TEXTURE_MAG_FILTER = GL_LINEAR) when sampling the "light scatter" FBO if it's been scaled down!

    If the performance or visual quality aren't high enough, it may be that another way of achieving a similar effect must be considered.

    HTH, Pete
Children
No data