I’m lucky enough to be back in soggy San Francisco for my third GDC. The week started off a little grey and grizzly and has continued that way, but with so many incredible demos in the expo hall, not to mention back to back sessions covering everything from music in games to localisation, who needs the sun anyway?
First up for me this morning is a talk on Post Processing Effects for Mobile from Arm’s very own Stephen Barton and Attilio Provenzano, as well as Srdja Stetic-Kozic from Serbian game studio, Nordeus.
Attilio was first on stage to introduce us to Arm’s extensive history in enabling mobile gaming and the work we’ve done in collaboration with Nordeus to optimise lighting and bloom effects in their fantastic new fantasy game, Spellsouls: Duel of Legends.
Attilio then introduced us to Srdja, the software engineer responsible for many of the complex graphical effects used in the game. Srdja explained that the two key technical goals in developing this game were to achieve AAA quality in order to provide the best possible user experience, and to be able to support the content across a huge range of devices from a Galaxy S3 to the latest top of the line smartphone. If you’ve read my recent launch blogs, you’ll know this is something of a key focus for Arm in our latest generation Mali Multimedia Suite products.
PBR shading with bloom lighting was deemed the right approach to achieve realistic visual effects for specialised textures, and lighting of these textures was the first aspect to address. As a fantasy game, spells are key to delivering the desired experience, and these achieve their magical quality through the use of lighting. Specular point lights were initially shaded, per pixel, with up to four lights supported at once. However, more lights were needed to achieve a truly realistic feel so a lot of work was put into Forward +, which separates the screen into tiles and for pixels in a tile only renders lights that affect that tile. This allowed double the number of lights to be supported. Unfortunately, some of the less powerful devices can struggle with maintaining even 30FPS while supporting lighting, so a mainstream-specific solution was needed. The game features a large number of metallic elements, which are very difficult to portray visually without bloom effects which provide highlights to instantly allow the eye to identify the materials are metallic.
The first thing Nordeus tried was a standard 4 pass, post process bloom. The problem with this was that it applies the bloom to the entire screen, rather than the specific materials, compromising visual quality. So, the next approach was to try HDR, but to do this with tone mapping in post processing was also found to compromise performance. From there, multiple render target bloom was the next approach. RGB24 produced great quality visuals, but effectively doubled the bandwidth used. By dropping the quality ever so slightly to an 8-bit texture they were able to achieve excellent artistic results without suffering the performance hit. However, on Mainstream devices this post processing still compromised FPS to a level the developers deemed unacceptable.
In order to address this issue, Arm focussed on the target of rendering the frame in 16ms, across as many devices as possible, to achieve 60FPS. The multiple render target approach took 3ms, so the first task was to reduce this time. To do this the team considered both optimizing the post processing pipeline, or simulating the same bloom effect in other ways without overheating or excessively stressing the device.
The focus of the optimisations started with the blur, which is the most expensive part of the bloom pipeline as the shader needs to sample multiple pixels. By using dual filtering enormous efficiency gains can be achieved versus a Gaussian blur at full resolution. Compared to a downscaled Gaussian blur, dual filtering has similar performances but a better-looking effect. Another approach was to try a texture-based bloom by generating a bloom intensity map based on the PBR glossiness map and save it in the alpha component of the glossiness map itself. This was achieved at minimal cost because you’ve already fetched the RGB components of the glossiness map. The performance of the texture based approach was able to retain almost the entire performance of excluding bloom altogether. Whilst not quite as visually stunning as the much more expensive options, the quality achieved at almost no cost was impressive. The only difficulty with this approach was in applying it to characters, which produced a less than ideal effect, which meant meant we needed to use a plane overlay. This applies a kind of billboard to each character, overlaying a mocked-up version of bloom that achieves almost the same visuals in under 1ms instead of 3.
Now, in order to establish what the budget for bloom was, it had very quickly become apparent that there simply wasn’t one, as the whole frame itself was at 28ms without the bloom even factored in. This therefore meant it was essential to optimise the entire game, as well as the bloom effects to make space for them in the pipeline.
To do this, the team looks at a whole host of factors such as the algorithm suitability, avoiding redundant work, issuing only the draw calls that are strictly necessary, and issuing them in the right order to make sure you’re not wasting performance on rendering objects that are hidden. The best way to identify these bottlenecks is through the use of a range of fantastic, free Arm tools, and Stephen Barton is our in house expert on this. He explained the process to go through in order to use these tools to identify and then address the bottlenecks in your game. You can access the Mali Offline Compiler, Mali Graphics Debugger and Arm DS-5 Streamline at our dedicated developer website, and read some of our more detailed blogs.
By using the DS-5 Streamline tool, the team were able to establish that Spellsouls is GPU bound, and specifically Fragment Shader bound. They also understood the importance of assessing every element of a scene to establish which element was the heaviest on the performance. In this instance it turned out to be the terrain, due to the fact that it features heavily and covers a large proportion of the scene, and also because of its visual complexity. The specific elements adding to this were tangent-space normal maps, reflections, and lighting. Realistic reflections are vital to a quality user experience so these were deemed necessary to keep. The lighting, however, was an area ripe for optimisation, and a significant saving was made simply by moving it to a lower resolution render texture. This means the time spent per light is much smaller and we can therefore add more lights at minimal cost, improving visuals. These adaptations brought us down to 21ms from the 28ms that the bloom-free scene started with, a pretty good start.
Next was to reduce the terrain resolution to 720p instead of 1080p. This reduced the number of pixels by 55% and immediately reduced render time from 10ms to 5ms. This is an acceptable quality loss in the terrain because it’s purely a backdrop, it’s not the major focus of the scene and not where your eye is naturally drawn, and visual quality overall was still excellent. These comparatively simple optimisations enabled us to bring that whole frame down to the 16ms we were aiming for, enabling Nordeus to target their awesome content at a much wider range of devices and tiers than they would previously have done. This means more consumers at a greater range of price points can access the same quality gaming content, a plus for any game developer not to mention the added value it brings to the device manufacturer in being able to promise such high quality content in the mainstream.