Skip navigation


1 2 3 Previous Next

ARM Mali Graphics

344 posts

In 2016 so far there seems to be a big focus on automation. The rise of the Internet of Things is part of the reason for this and it’s opening our eyes as to how many aspects of our everyday lives can be streamlined. Simply by allowing machines, sensors and technologies to ‘talk’ to each other, share data and use it to make smart decisions; we can reduce the direct input we need to have to keep our world moving.


Home automation is one of the first things people think of but it soon to leads to discussions on smart agriculture, automated office management and remote monitoring and maintenance of vehicles and assets. Not only that, but an area garnering a whole lot of interest is smart automotive. We know that many of these examples, in order to operate safely and effectively, need to be able to take in enormous amounts of data and analyse it efficiently for an immediate response. Before your home can decide to let you in through the front door without a key for instance, it needs to know who you are. Before your autonomous car can be unleashed onto the streets, it needs to be able to spot a hazard, but how does it do it? One of the key drivers (see what I did there?) in this area is computer vision. 


ARM®’s recent acquisition of Apical®, an innovative, Loughborough-based imaging tech company, helps us to answer these questions. With such a rich existing knowledge base and a number of established products, ARM, with Apical, is well placed to become a thought leader in computer vision technology. So what is computer vision? Computer vision has been described as graphics in reverse, in that rather than us viewing the computer’s world, the computer has turned around to look at ours. It is essentially exactly what it sounds like. Your computer can ‘see’, understand and respond to visual stimuli around it. In order to do this there are camera and sensor requirements of course, but once this aspect has been established, we have to make it recognise what it’s seeing. We have to take what is essentially just a graphical array of pixels and teach the computer to understand what they mean in context. We are already using examples of computer vision every day, possibly without even realising it. Ever used one of Snapchat’s daily filters? It uses computer vision to figure out where your face is and of course, to react when you respond to the instructions (like ‘open your mouth…’). Recent Samsung smartphones use computer vision too, a nifty little feature for a book worm like me is that it detects when your phone is in front of your face and overrides the display timeout so it doesn’t go dark mid-page. These are of course comparatively minor examples but the possibilities are expanding at breakneck speed and the fact that we already take these for granted speaks volumes about the potential next wave.

apical-logo-400.jpgComputer vision is by no means a new idea, there were automatic number plate recognition systems as early as the 60s and 70s, but deep learning is one of the key technologies that has expanded its potential enormously. The early systems were algorithm based, removing the colour and texture of a viewed object in favour of spotting basic shapes and edges and narrowing down what they might represent. This stripped back the amount of data you had to deal with and allowed the processing power to focus on the basics in the clearest possible way. Deep learning flipped this process on its head and said, instead of algorithmically figuring out that a triangle of these dimensions is statistically likely to be a road sign, why don’t we look at a whole heap of road signs and learn to recognize them? Using deep learning techniques, the computer can look at hundreds and thousands of pictures of say, an electric guitar, and start to learn what an electric guitar looks like in different configurations, contexts, levels of daylight, backgrounds and environments. Because it sees so many variations it also starts to learn to recognise an item even when part of it is obscured because it knows enough about it to rule out the possibility that it’s something else entirely.  Sitting behind all this cleverness are neural networks, computer models that are designed to mimic what we understand of how our brains work.  The deep learning process builds up connections between the virtual neurons as it sees more and more guitars.  With a neural net suitably trained, the computer can becoming uncannily good at recognising guitars, or indeed anything else it’s been trained to see.


The ImageNet competition tests how accurately computers can identify specific objects in a range of images


A key milestone for the adoption of deep learning was at the 2012 ImageNet competition. ImageNet is an online research database of over 14 million images and runs an annual competition to pit machines against each other to establish which of them produces the fewest errors when asked to identify the objects in a series of pictures. 2012 was the first year a team entered with a solution based on deep learning. Alex Krizhevsky’s system wiped the floor with the “shallow learning” competition that used more traditional methods and started a revolution in computer vision. The world would never be the same again. The following year there were of course multiple deep learning models and Microsoft broke records recently when their machine was actually able to beat their human control subject in the challenge!


A particularly exciting aspect of welcoming Apical to ARM is Spirit™, which takes data from video and a variety of sensors and produces a digital representation of the scene it’s viewing. This allows, for example, security staff to monitor the behaviour of a crowd at a large event and identify areas of unrest or potential issues based on posture, pose, mannerisms and numerous other important but oh so subtle factors. It also opens the doors for vehicles and machines to begin to be able to process their surroundings independently and apply this information to make smart decisions.


Spirit can simultaneously interpret different aspects of a scene into a digital representation


This shows us how quickly technology can move and gives some idea of the potential, particularly for autonomous vehicles as we can now see how precisely they could quantify the hazard of say, a child by the side of the road. What happens though, when it has a choice to make? Sure, it can differentiate between children and adults and assess that the child statistically holds the greater risk of running into the road. However, if there’s an impending accident and the only way to avoid it is to cause a different one, how can it be expected to choose? How would we choose between running into that bus stop full of people or the other one? By instinct? Through some internal moral code? Where does the potential of these machines effectively to think for themselves become the potential for them to discriminate or produce prejudicial responses? There is, of course, a long way to go before we see this level of automation but the speed at which the industry is advancing suggests these issues, and their solutions, will appear sooner rather than later.


ARM’s acquisition of Apical comes at a time when having the opportunity to exploit the full potential of technology is becoming increasingly important. We intend to be on the front line of ensuring computer vision adds value, innovation and security to the future of technology and automation. Stay tuned for more detail on up and coming devices, technologies and the ARM approach to the future of computer vision and deep learning.

As a world leading IP company ARM is passionate about protecting and promoting the ideas, innovations and skills required to produce next generation tech. A large part of that involves supporting the teaching of STEM subjects in schools and encouraging more of the future generations to get involved in programming and development.


One of the ways we do this is to collaborate on educational events with local institutions and share the knowledge of our experts with local students. Future Experience Points (FXP), will be held at Cambridge Regional College from June 25th to 27th and will feature a series of presentations, workshops and mentoring sessions that tie in with the computer science curriculum for students at local schools and colleges. Focussing on game development and graphic design the event is intended to bring theoretical subjects to life through practical application and hands on experience.


FXP will also feature a 48 hour game jam where teams of youngsters will work together to create and develop a mobile game with hands-on, practical support and training from industry experts. All the games developed as part of the event will be available for the public to play at Cambridge’s annual Big Weekend event in July. Prizes will be awarded in two categories, Concept and Development, and we’ll be giving away five Kindle Fire tablets to the winning team in the concept category!


With the prevalence of mobile devices on the market most students are already very familiar with mobile platforms and mobile gaming but often with no background knowledge of the technology that powers these devices. Providing an insight into the innovations and advancements that bring them the latest content adds a new dimension to the understanding of mobile technology.


It’s hoped the event will encourage more young people to pursue careers in graphic design and engineering, game development and related technology industries. As a local Cambridge company ARM considers it a top priority to advance the career opportunities for local teens and it would be great to see how many of these students could end up working with us in the future!


Can’t wait? We’ve also worked closely with Michael Warburton of Cambridge Regional College to produce a series of tutorials to help you get started developing your game for mobile devices!

fxp2.jpgStudents get one to one advice and tips to kick start their graphics experience

Virtual Reality (VR) has been a focus area for ARM® in recent years with significant investment made in ensuring the ARM Mali™ range of graphics and multimedia processors is a great fit for mobile VR devices now and in the future.


We’re pleased to have been working with Google to ensure our range of Mali GPUs, Video and Display processors are able to deliver the ultimate mobile VR experience on Daydream. In addition, ARM has been working closely with a number of our leading silicon partners, enabling them to ship their first wave of Daydream ready devices.


Google’s announcement of high performance mobile VR support through Daydream, combined with our broad ecosystem of partners using the no.1 shipping GPU, is making VR accessible to hundreds of millions of consumers across the globe.


Why ARM and Mali for VR?

We’ve released a series of blogs over the past few months on the various VR technologies and activities which make Mali products a great fit.


VR places increasing performance demands on the systems we’re seeing today. Not only are we rendering for two different eyes, but we are also required to render at higher framerates and screen resolutions to produce a quality experience. Mali GPUs with their performance scalability and continual emphasis on energy efficiency ensure we are well positioned to address these ever increasing requirements.


Mali GPUs also offer additional features that benefit VR use-cases. ARM Frame Buffer Compression (AFBC) is a system bandwidth reduction technology and is supported across all of our multimedia IP. AFBC is able to reduce memory bandwidth (and associated power) by up to 50% across a range of content. This and other system wide technologies further enable efficient use-cases such as VR video playback. A number of other features including tile based rendering and other bandwidth saving technologies such as ASTC ensure we’re able to meet the high resolution and framerate requirements of VR. Mali GPUs also support 16x MSAA for best quality anti-aliasing. This is essential for a high quality user experience in VR as the proximity of our eyes to the images and the fact that we are viewing them in stereo means that any artefacts are much more noticeable than in traditional applications.


On the software side, a large amount of driver and optimization work has gone into our Mali DDK in order to reduce latency and ensure fast context switching required for VR. In addition to optimizations, we’ve enabled a number of extensions to OpenGL ES to support efficient rendering to multiple views for both stereo and foveated rendering.


VR is an incredibly exciting use-case for ARM and is an area in which we intend to continually invest and innovate to make the VR experience on mobile even more awesome. We’re proud to be in close collaboration with Google on Daydream and look forward to the opportunities this opens up for the industry.

In the blog Using Mali Graphics Debugger on a Non-rooted device we discussed the idea that you could use Mali Graphics Debugger(MGD) with a non-rooted phone. This blog will take this idea further by showing you how to use MGD with a Unity application on a non-rooted device. Although this can be more complicated than using a standard application, the same principles are used as in the previous guide:


  1. Add the interceptor library to your build system.
  2. Edit your activity to load the interceptor library
  3. Install the MGDDaemon application on your device.


Let's explore these steps in detail and how to execute them in Unity. For this guide it is assumed that you have an Android application already created in Unity.


The first thing you need to do is create an Assets\Plugins\Android folder in your project. Then you need to copy the file into it. The file can be found in the target

\android-non-root\arm\[armaebi-v7a/arm64-v8a] folder in your MGD installation directory. This will make sure that the interceptor library will get packaged into your application.


Now the standard activity that is used by Unity when making Android applications won't load the MGD interceptor library by default, so we need to make our own. This is done via eclipse or command line outside of the Unity environment. Here is a template of the code you will need:


package test.application;
import com.unity3d.player.UnityPlayerActivity;
import android.os.Bundle;
import android.util.Log;

public class StandardActivity extends UnityPlayerActivity
    protected void onCreate(Bundle savedInstanceState)
        catch( UnsatisfiedLinkError e)
            Log.i("[ MGD ]", " not loaded.");


Note that whatever your package is you must make sure that your directory structure matches. So if you have a package of com.mycompany.myapplication, then your should be located in the directory structure com\mycompany\myapplication. In the case above you should store the in test\application\




As you need some functions that come directly from Android you need to add the Android.jar in your system to the classpath. It is usually located in the platforms\android-<X>\ where X is the Android SDK version you are targeting. Also as you are extending from the UnityPlayerActivity class you need to add the Unity classes.jar file, which is located in your Unity folder under the path Editor\Data\PlaybackEngines\AndroidPlayer\Variations\mono\Development\Classes. Finally if

you are using a JDK that is greater than 1.6 you need to add the -source 1.6 and -target 1.6 to your compile line or Unity won't be able to use it correctly.


So your full line to compile your java file should resemble something like:


C:\scratch>javac -cp "C:\Program Files\Unity\Editor\Data\PlaybackEngines\AndroidPlayer\Variations\mono\Development\Classes\classes.jar;C:\android\sdk\platforms\android-21\android.jar" -source 1.6 
-target 1.6 test\application\


or if you are using a Mac


javac -cp "/Users/exampleUser/android-sdk-macosx/platforms/android-23/android.jar:/Applications/Unity/PlaybackEngines/AndroidPlayer/Variations/mono/Release/Classes/classes.jar" -source 1.6 
-target 1.6 test/application/


We then need to turn this class into a jar file so we can include it into our Unity project. To do that we need to write:


jar cvf myActivity.jar test\application\StandardActivity.class


Place the created jar file in your project's Assets\Plugins\Android folder you created in the first step.


Now just because we have created a new activity class doesn't mean that Unity is going to use it. For this to happen we also need to override the Android Manifest file that Unity uses. If you create an AndroidManifest.xml file in your assets\Android folder Unity will automatically use this one instead of the default one that is provided. The minimum that is recommended to put in this file is:


<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="">
  <application android:icon="@drawable/app_icon" android:label="@string/app_name">
    <activity android:name="StandardActivity"
            <action android:name="android.intent.action.MAIN" />
            <category android:name="android.intent.category.LAUNCHER" />


Where activity android:name is the name of the activity you have created. Once this has finished you should be able to build your Android application in the usual way. One final thing to note is that your bundle in Unity must match the package that you gave to your activity. In our example this would be test.application (case sensitive).



Once your application has been built install it onto the device and then install the MGDDaemon app onto your device and use MGD. If you need more information about using and installing the MGD application consult the blog post: Using Mali Graphics Debugger on a Non-rooted device

Interested in Vulkan? Look out for the next Vulkan meet up which is being held on the ARM premises in Cambridge on May 26th.


This will be the 3rd Vulkan Developer event which will take a deeper than ever dive into programming 3D graphics using the Vulkan API.


Join the meetup to register and while you're there you can join the Khronos UK Chapter to hear future news and events:

The full agenda is also available at the link above and it coincides with the Cambridge Beer Festival which ARM are providing free transport to for "networking" purposes


Can't make it in person? Sign up for dial in details



Further information

In this full-day of technical sessions we aim to provide 3D developers like yourself with everything you need to come up to speed on Vulkan and to forge ahead and explore how to use Vulkan in your engine or application.


Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs. Khronos launched the Vulkan 1.0 specification on February 16th, 2016 and Khronos members released Vulkan drivers and SDKs on the same day. More info:


Prior Knowledge

The sessions are aimed at 3D graphics developers who have hands-on experience of programming with API’s such as OpenGL, OpenGL ES, Direct3D and Metal.

At GDC 2016 ARM® and Nibiru, a key ecosystem partner, announced the exciting launch of the Joint Innovation Lab. Designed to give developers the best possible support when developing mobile games, the innovation lab promises to streamline and simplify the process of porting mobile games to Nibiru’s ARM-based platforms.

As VR is such a focal point for the mobile gaming industry it’s also a key focus for Nibiru. Currently offering over 40 different all-in-one VR devices designed to work with all levels of content, Nibiru are a thought leader in the standalone VR space.


One of the exciting upcoming VR releases from game studio Mad Rock and enabled by Nibiru is X-Planet, a first-person shooter game specially designed for their ARM Mali powered VR headsets. The game concept is familiar yet engaging, far far away there is a planet called X-Planet, and you, the pilot, are charged with defending it against unknown adversaries. What’s really cool about this is the use of eye-tracking software to interact with the game and control your movements using gaze based targeting to pilot an armed cockpit through intense battles. Your enemies become more powerful as you progress demanding you defeat progressively harder waves of robot enemies, attacking you from all sides!


The awesome soundtrack demands headphones for a fully immersive experience and the game can be fully enjoyed while seated to reduce the chance of over-excited users stepping on the cat mid-battle.


X-Planet from Mad Rock


X-Planet is due to launch across all of Nibiru’s high performance platforms including VR Launcher and VR AIO HMD, aiming to provide the ultimate VR gaming experience. It’s well known that VR places high demands on processors and Nibiru choose ARM Mali GPUs in order to get the best possible performance with the lowest possible power cost. The ARM & Nibiru Joint Innovation Lab can help take VR gaming to the next level.

1 Nibiru.jpgNibiru launching the Joint Innovation Lab at the ARM Lecture Theatre at GDC 2016

It's not often I get flown half way round the world in order to explain common sense but this March it happened as I was delivered to GDC in San Francisco to talk about best practices in mobile graphics. Unlike previous talks where I wax lyrical about the minutia of a specific optimization technique, this time I had to cover a wide range of things in just twenty minutes. Paired as I was with a similarly compressed talk from Stephen Barton about using our DS5 and MGD tools to analyse graphical applications for bottlenecks, it was a study in time management. One of the highlights of his talk was the latest MGD update, which you can read more about on his recent blog post. Pity our poor audience who, having had insufficient time to learn how to find their performance bottlenecks, were now going to be subject to my having insufficient time to tell them how to fix them.


We're making the slides available for this presentation (my section starts on slide 29) but unlike previous presentations there was no video taken, so some of the pages may need a little explanation here. Whereas usually I'd have time to look at an app and check for specific changes, obviously the people watching wanted to know what they could do on their own software. I therefore had to talk about the most common places where people leave room for improvement: Batching, Overdraw, Culling, Levels of Detail, Compression and Antialiasing.


Batching is a topic I have been outspoken about many times in the past and I really just gave some simple solutions here, such as combining static geometry into a single mesh. Though lip service was paid to dynamic batching and instancing, that topic is explained far better in my older post Game Set & Batch.

Although I've spoken about overdraw in the context of batching before, not much has been said about more scene related overdraw other than a sort of flippant "front to back, y'all" before talking about a batching solution. People often think of overdraw in the context of having to sort objects in a scene by their distance from the camera. One case lots of people complain about however, is what to do when the objects overlap or surround each other in some way, because then they can't be trusted. In that situation there's an even easier solution though. If you know one thing will always cover another, you can make a special ordering case for it in the code. There are even a number of very common savings to be made on a full screen scale. If the camera is inside a room, anything else inside the room can be rendered before the room itself, as they will always occlude the walls and the floor. This goes for rendering things before the ground in outdoor scenes too.

It's mostly about efficient scene management, but even when you don't know beforehand what order something will be drawn in you can make changes to reduce the impact of overdraw. If you have two pieces of geometry in the scene which use different shaders and for whatever reason it's hard to tell which to draw first, draw the least expensive shader first. At least that way the cheaper overdrawn pixels will be wasting less, and the occluded pixels from the expensive shader are saving more.


On a similar topic of not calculating that which is unseen, I then spoke of culling and the kind of large scale culling which is possible on the CPU side to reduce vertex calculations. This is achieved by reducing an object down to a bounding box, defined by eight points that can then be transformed to become a bounding rectangle on the screen. This rectangle can then be very quickly checked to see if it is on or off screen, or even if it's inside the bounds of a window or doorway through which we are seeing its scene. For most scenes this is the only kind of high level, large scale occlusion culling that makes sense because the next step would be to consider whether objects in a scene occlude each other. For that you need to think about the internal bounding volume which is guaranteed to occlude everything behind it regardless of its orientation and which must be generated to fit the geometry. Far more complicated than describing the bounding box.


Culling things in the distance is considered somewhat old hat in modern applications. We associate the sudden appearance of distant objects or emergence from a dense, opaque fog to be an undesirably retro aesthetic. In their place we have two new techniques. The fog is replaced by clever environmental design, limiting view distance by means of occluding walls and available eye-lines. For large, open spaces, having objects pop into reality had been replaced by dynamic levels of detail. The funny thing about levels of detail is that they don't have to be dynamic to be relevant. Levels of detail go beyond reducing unnecessary vertex processing, there's a small amount of overhead to process a triangle. This triangle setup cost is very small so ordinarily you never notice it, as it happens while the previous triangle is being turned into fragments, but if the fragment coverage of your triangles is too low, you can actually notice this cost bumping up your render times. Before you even worry about implementing dynamic levels of detail you ought to ask yourself if you've picked the right level of detail to begin with. If the average triangle coverage (which can be calculated in Streamline) is in single digits, you're probably doing something wrong. We actually see this all the time in projects where the artist has designed beautiful tree models and then they're lined up in the distance where none of that detail can be seen. If they're approachable then maybe a high detail model would be useful, switched in based on proximity, but if you just want a bunch of things in the background you may be better off with batched billboard sprites.


Having already talked about texture compression many times in the past, there's a lot of material on the specifics available from my previous presentations. This time, to take it in a different direction, I talked about how uncompressed textures have their pixels re-ordered in memory to give them better caching behaviour. This is similar to that seen in compressed textures but without the bandwidth saved when the cache misses and a block needs to be pulled from memory. This explains the block layout I've advocated many times in the past and I went on to talk more about the other rules of what makes texture compression a special case in image compression algorithms: Mainly the ability to immediately look up a block in the image (random access) decode it without any data from surrounding blocks (deterministic), and with no need for a dictionary of symbols (immediate).


One topic I was surprised to realize I'd never mentioned before was how texture compression works with mipmapping. Mipmapping is the technique of storing images at half, quarter, eighth (and so on) resolutions to reduce interference patterns and speed up texture loads. It's like automatic level of detail selection for textures. What people might not realise however is that whereas uncompressed texture mipmaps can be generated at load time with a single line of Open GL ES code, mipmaps for compressed textures have to be generated at compile time and themselves compressed and stored within the application's assets. It's a small price to pay for all that tasty, tasty efficiency however.


Finally I brought up antialiasing, because I figured room for improvement needn't necessarily be in terms of overhead reduction. Though I failed to bring it together due to time constraints on the day, the real message I wanted to impart in this talk was that optimization has become a dirty word in many ways. To suggest an application needs optimizing implies you've used everything the GPU's got and to make it run at a decent frame rate you'll have to make it look worse. That's not what optimization is. The metaphor I used was that if you don't optimize your application, it's like taking a single bite of an apple, throwing the rest away and complaining that it didn't have enough flesh. Well optimized code can munch away at that apple getting the absolute most out of it and done right, optimization doesn't make your application look worse, it gives you headroom to make it look better. Batching and culling let you put more stuff in your application, with level of detail and billboard impostors you can even have dense arrays of objects in the backgrounds. Compressed textures let you have more textures at a higher resolution, and full screen antialiasing is almost zero cost on Mali based systems.


That's the real message here.


That's not where the presentation ends, though. Part of my job involves taking apart people's graphics at the API level and listing all the things they've done wrong, or at least things they could do better. When they read the laundry list of sins however they very much interpret it as me telling them what they did wrong. So imagine my elation when given a chance to redress the balance and pick apart one of our own demos, in public, and discuss our own mistakes and faults. We're human too, you know.


Though difficult to describe in blog format, the screen shot slides at the end of the presentation show me stepping through the demo render process, explaining times when bad decisions were made regarding render order, when batchable objects were drawn individually, how practically nothing was culled and even a few glitches, such as the sky box covering the particle effects and the UI being rendered on screen even when it's opacity is zero. It's almost a shame that after identifying them we had to fix all these things, it would have been nice for the audience to know they were there.


If you're interested in using MGD and DS5 to profile your applications, there's a two part in-depth case study by Lorenzo Dal Col with far more detail than I could fit in my presentation:

Mali GPU Tools: A Case Study, Part 1 — Profiling Epic Citadel

Mali GPU Tools: A Case Study, Part 2 — Frame Analysis with Mali Graphics Debugger

Traditionally Mali Graphics Debugger(MGD) works on a rooted device, in this mode an interceptor layer sits between your application and the driver. Your application then calls into this interceptor layer and the interceptor then sends a copy of this data back to the MGD application and passes the call on to the driver.



However, this isn't the only way that MGD can be used. The second option has all of the functionality of the first option with the added benefit that it will also work on a standard Android device with no modification. The trade-off is that you need access to the full source code of the application that you want to profile. This blog explores the second option so you can debug your applications on non-rooted devices.




  1. Your computer should be setup for Android development, in particular:
    • The Android SDK and NDK should be installed
    • Your system path should include the adb binary.
  2. You should have access to the full source code of your application
  3. Your device must be running at least Android 4.2




  1. Copy the folder called android-non-root from the target directory in your MGD installation to your application's root folder.

  2. In your target application's add the following code.


 include $(LOCAL_PATH)/../android-non-root/


  1. In your projects main activity class you need to add the following:


                catch( UnsatisfiedLinkError e )
                                // Feel free to remove this log message.
                                Log.i("[ MGD ]", " not loaded.");


  1. Recompile the application and install it on your Android device


Running your application


The first thing we need to do is to install the MGDDaemon application on the target. The MGDDaemon application is responsible for sending the information from the interceptor library to the host. Without it the host won't receive any data.


  • cd into the android-non-root directory and run adb install -r MGDDaemon.apk
  • Then run adb forward from a command prompt adb forward tcp:5002 tcp:5002
  • Then launch the MGD Application. This should take you to a list of applications that it has detected have the MGD interceptor library correctly inserted. Before you click on one you need to press the Mali Graphics Debugger daemon switch to on.



  • Once pressing the switch you should be able to connect to the process in the MGD host and a new tab for your current trace should be created. At this point you just need to click on your application in the MGD daemon application and the trace should work.


Following these steps you should be able to use MGD on any Mali based platform. If you have any issues please raise them on the community and someone will be more than happy to assist you through the process.



Last month the game developer community celebrated its main event in San Francisco: the Game Developers Conference (GDC). The longest-running event devoted to the game industry set a new record in its 30th edition with more than 27,000 attendees. The expo hall was crowded until the very last minute and many talks were moved to bigger rooms to accommodate the demand.


In this blog I would like to provide a round-up of one of the ARM sponsored talks at GDC 2016: Achieving High Quality Mobile VR Games. I had the privilege of sharing the talk with two great colleges; Carl Callewaert (Unity Technologies Americas Director & Global Leader of Evangelism) and Patrick O'Luanaigh (nDreams CEO).


Figure 1. Delivering the presentation at GDC 2016.

The talk was devoted to mobile VR but each of the speakers presented different aspects of the topic. I spoke from the perspective of developers and shared our experience in porting Ice Cave demo to Samsung Gear VR. I also talked about some highly optimized rendering techniques, based on local cubemaps, that we have used in the demo to achieve high quality VR content. I also discussed the importance of rendering stereo reflections and showed how to implement them in Unity.


Carl talked from the perspective of the game platform which is used by more than a half of developers all over the world. He shared with the audience the latest news about the VR integration into Unity and discussed very interesting ideas about how to use well established architectural design principles to build VR gaming environments that create the sense of presence and immersion. To the delight of the attendants Carl showed the first part of the real-time rendered short film Adam, an impressive photorealistic demo that highlights Unity’s rendering capabilities.


Finally, Patrick presented from the perspective of the game studio that has already successfully released several VR games. As part of the development process nDreams has extensively researched movement in VR. In his talk Patrick shared some of their most interesting findings as part of their commitment to delivering the best user experience in their VR game catalogue.


The concept of local cubemaps

The content I delivered during the first part of the session was devoted mainly to describing several rendering techniques, based on local cubemaps, that we used in the Ice Cave demo. For those who are not very familiar with the concept of local cubemaps I explain it briefly below.


Let’s assume we have a local environment delimited by an arbitrary boundary and we have baked the surrounding environment in a cubemap from a given position inside of the local environment. We are looking at some star in the boundary in the direction defined by vector V and we want to answer the question: what is the vector we need to use to retrieve the star from the cubemap texture?


Figure 2. The concept of local cubemap.


If we use the same vector V instead of the star we will get the happy face as shown in the left picture of Figure 1. What then is the vector we need to use? As we can see from the middle picture we need to use a vector from the cubemap position to the intersection point of the view vector with the boundaries. We can solve this type of problem only if we assume some simplifications.


We introduce a proxy geometry to simplify the problem of finding the intersection point P as shown in the picture on the right. The simplest proxy geometry is a box, the bounding box of the scene. We find the intersection point P and we build a new vector from the position the cubemap was baked to the intersection point and we use this new “local corrected vector” to fetch the texture from the cubemap. The lesson here is that for every vector we use to retrieve whatever we bake in the local cubemap, we need to apply the local correction.


Improving VR quality & performance


Developing games for mobile devices is challenging as we need to very carefully balance runtime resources. Mobile VR is even more challenging as we have to deal with the added complexity of stereo rendering and the strict requirements for FPS performance to achieve a successful user experience.


Several highly efficient rendering techniques based on local cubemaps used in the Ice Cave demo have proved very suitable for VR as well.

Dynamic Soft Shadows based on local cubemaps


As we know, runtime shadows in mobile devices are expensive; in mobile VR they are a performance killer. The new shadow rendering technique based on local cubemaps developed at ARM contributes to saving runtime resources in mobile VR while providing high quality shadows. The implementation details of this technique can be found in several publications 1, 2, 3.


Figure 3. Dynamic soft shadows based on local cubemaps.


The main idea of this technique is to render the transparency of the local environment boundaries to the alpha channel of a static cubemap off-line. Then at runtime in the shader we use the fragment-to-light vector to fetch the texel from the cubemap and determine if the fragment is lit or shadowed.  As we are dealing with a local cubemap the local correction has to be applied to the fragment-to-light vector before the fetching operation. The fact that we use the same texture every frame guarantees high quality shadows with no pixel shimmering or instabilities which are present with other shadow rendering techniques.


Dynamic soft shadows based on local cubemaps can be used effectively with other runtime shadows techniques to combine shadows from static and dynamic geometry. Another important feature of this technique is the fact that it can efficiently reproduce the softness of the shadows, i.e. the fact that shadows are softer the further away they are from the object that creates them.



Figure 4. Combined shadows in the Ice Cave demo.


Reflections based on local cubemaps


The local cubemap technique can also be used to render very efficient and high quality reflections. When using this technique the local environment is rendered off-line in the RGB channels of the cubemap. Then at runtime in the fragment shader we fetch the texel from the cubemap in the direction of the reflection vector. Again though, as we are dealing with a local cubemap we first need to apply the local correction to the reflection vector, i.e.  build a new vector from the position where the cubemap was generated to the intersection point P (Figure 4). We finally use the new vector R’ to fetch the texel from the cubemap.


Figure 5. Reflections based on local cubemaps.


The implementation details of this technique can be found in previous blogs 3, 4, 5. This technique can also be combined with other runtime reflection techniques to integrate reflections from static and dynamic geometry 3, 6.



Figure 6. Combined reflections in the Ice Cave demo.


Stereo reflections in VR


Stereo reflections are important in VR because if reflections are not stereo, i.e. we use the same texture for both eyes, then the user will easily notice that something is wrong in the virtual world. This will break the sense of full immersion, negatively impacting the VR user experience.


For planar reflections, rendered at runtime, that use the mirrored camera technique 6, we need to apply a mirror transformation to the main camera view matrix. We also need a half eye separation shift in the x axis to find the left/right position where reflections must be rendered from. The mirrored camera(s) alternately renders left/right reflections to a single texture that is used in the shader by the left/right eye of the main camera to apply the reflections to the reflective object.


At this point we must achieve a complete synchronization between the rendering of left/right reflection camera(s) and the left/right main camera. The picture below, taken from the device, shows how the left and right eyes of the main camera are using different colors in the reflection texture applied to the platform in the Ice Cave demo.


Figure 7. Left/right stereo reflection synchronization.

If we are dealing with reflections based on local cubemaps then we need to use two slightly different reflection vectors to fetch the texel from the cubemap. For this we need to find (if it is not provided) the left/right main camera position and build the left/right view vector used to find the reflection vector in the shader. Both vectors must be “locally corrected” before fetching the reflection texture from the cubemap.


A detailed implementation of stereo reflections in Unity can be found in a blog published recently 6.


Latest Unity improvements


During his presentation Carl pointed to the latest Unity effort in VR – the new VR editor that allows the building of VR environments directly from within an HMD. At GDC 2016 we saw a live demo that showed the progress of this tool in a presentation from Timoni West (Unity Principal Designer).


The Adam demo Carl displayed to attendees was also a nice proof-point for how much Unity has advanced in terms of real-time rendering capabilities. The picture below gives some idea of this.


Figure 8. A photogram from the first part of the Unity real-time rendered short film “Adam”.


Carl also went through some highlights of a presentation he had delivered the day before about how to create a sense of presence in VR. I found his ideas about the importance of creating depth perception when designing VR environments really interesting. Greeks and Romans knew very well how important it is to correctly manage perspective, light, shadows and shapes to create the right sense of presence that invites you to walk around and understand the space.


Movement in VR


The last part of the talk was devoted to movement in VR. Patrick’s talk attracted much attention from attendees and prompted a lot of questions at the end. Movement in VR is an important topic as it directly influences the quality of the VR experience. nDreams development team performed  extensive research into different types of movement in VR and their impact on several groups of users. The figures Patrick presented in the talk about the results of this research were a valuable takeaway for attendees.


According to Patrick, mobile VR control will move towards controllers, tracked controllers and hand tracking, allowing more detailed input.

Initial nDreams tests confirmed some basic facts:


  • Movement needs to be as realistic as possible. When moving, aim to keep the speed to around 1.5 m/s as opposed to, for example,  Call of Duty where the player often moves at 7 m/s. Keep any strafing to a minimum, and keep the strafe speed as low as possible.
  • Don’t take control of the camera away from the player i.e. camera shakes, cutscenes etc.
  • Ensure there is no perceived acceleration. A tiny negligible acceleration in movement for example can take the edge off starting and stopping, but acceleration over any period of time is incredibly uncomfortable.



Figure 9. Some nDreams basic findings.


In terms of translation movement nDreams researched two main modalities: instant teleport and blink. Blink is a kind of fast teleport where your move is completed within 120ms. This movement time is so short, there is no time to experience any sickness but the user has a sense of motion and tunnel effect. Teleport is seen as more precise due to the additional directional reticule, whereas blink feels more immersive.


Rotation study included trigger and snap modalities. Trigger rotations use the shoulder buttons of the controller to rotate in steps of 45 degrees to left/right each time. Snaps rotations used the joystick buttons instead. Rotation-wise, participants mostly preferred triggers; however the consumers who understood snap effectively preferred its flexibility.


Some figures about results of movement and rotation research are shown below.



Figure 10. Some figures from the nDreams’ movement and rotation research.


The table below summarizes some of the most important findings delivered by Patrick O'Luanaigh.


Movement needs to be as realistic as possible. Ideally keep your speed to around 1.5m/s.
Do not take control of the camera away from the player.
Ensure there is no perceived acceleration.

Lower speed moving and strafing speed is much more comfortable than a faster one. High rotation speed is seen as more comfortable,

since your rotation normally finished before you start to feel motion sick

The best solution for rotation is to turn with your body.
Alternative controls encourage players to move their body to look around. Snap rotations.
Rotation-wise participants mostly preferred triggers; however the consumers who understood snap effectively preferred its flexibility.
Fast teleport (blink) at 100 m/s is sickness free and more immersive than simple teleport.
Instant teleport is seen as more precise due to the additional directional reticule.

Remove movement and rotation altogether.

Figure 11. Summary of  findings from nDreams’ research about movement in VR.


VR is just taking its first real steps and there is a lot still to explore and learn. This is the reason Patrick concluded his presentation with a recommendation I really liked: Test everything! What works for your game may be different from someone else’s.




The talk Achieving High Quality Mobile Games at GDC 2016 had a great turn out and lots of questions were discussed at the end. After the talk we had many people coming to the ARM booth to find out more about Ice Cave demo and the rendering techniques based on local cubemaps discussed in the talk. What GDC 2016 showed above all was the great uptake VR is experiencing and the increasing interest of the development community and game studios in this exciting technology.


Finally, I would like to thanks Carl Callewaert and Patrick O'Luanaigh for their great contribution to the presentation.




  1. Efficient Soft Shadows Based on Static Local Cubemap. Sylwester Bala and Roberto Lopez Mendez, GPU Pro 7, 2016.
  2. Dynamic Soft Shadows Based on Local Cubemap. Sylwester Bala, ARM Connected Community.
  3. ARM Guide for Unity Developers, Mali Developer Center.
  4. Reflections Based on Local Cubemaps in Unity. Roberto Lopez Mendez, ARM Connected Community.
  5. The Power of Local Cubemaps at Unite APAC and the Taoyuan Effect. Roberto Lopez Mendez, ARM Connected Community.
  6. Combined Reflections: Stereo Reflections in VR. Roberto Lopez Mendez, ARM Connected Community.
  7. Travelling Without Moving - Controlling Movement in Virtual Reality. Patrick O'Luanaigh, Presentation delivered at VRTGO, Newcastle, 2015.

The Ice Cave demo is a Unity demo released by ARM® Ecosystem. With this demo we wanted to show that it is possible to achieve high quality visual content on current mobile devices powered by ARM Cortex® CPUs and ARM Mali™ GPUs. A number of highly optimized rendering effects were developed for this demo.


After the demo was released we decided to port it to Samsung Gear VR using the Unity native VR implementation. During the porting work we made several changes as not all of the features of the original demo were VR friendly. We also added a couple of new features, one of which was ability to mirror the content from the Samsung Gear VR headset to a second device. We thought it would be interesting to show people at events what the actual user of the Samsung Gear VR headset was seeing in real time. The results exceeded even our expectations.



Figure 1. Ice Cave VR mirroring from Samsung Gear VR at Unity AR/VR Vision Summit 2016.


At every event where we have shown the mirroring from the Ice Cave VR demo running on the Samsung Gear VR we have been asked how we achieved it. This short blog is the answer to that question.


I think the reason this technique raises so much interest is because we like to socialize our personal VR experience and at the same time, other people are simply curious about what the VR user is experiencing. The desire to share the experience works both ways. For developers it is also important to know, and therefore helpful to see, how users test and experience the game.


Available options


In 2014 Samsung publicly announced their AllShare Cast Dongle to mirror the content from their Samsung Gear VR. The dongle connects to any HMDI display and then mirrors the display of the smartphone onto the secondary screen in a similar way to Google Chromecast. Nevertheless, we wanted to use our own device and decided to test an idea we heard had worked for Triangular Pixels when Katie Goode (Creative Director) delivered a talk at ARM.


The idea was very simple: to run, in a second device, a non VR version of the application and send the required info via Wifi to synchronize both applications. In our case we just needed to send the camera position and orientation.



Figure 2. The basic idea of mirroring.


The implementation


A single script described below manages all the networking for both client and server. The server is the VR application running on the Samsung Gear VR headset, while the client is the non-VR version of the same application running on a second device.


The script is attached to the camera Game Object (GO) and a public variable isServer defines if the script works for the server or the client side when building your Unity project. A configuration file stores the network IP of the server. When the client application starts it reads the server’s IP address and waits for the server to establish a connection.


The code snippet below performs the basic operations to set up a network connection and reads the server’s IP network address and port in the function getInfoFromSettingsFile. Note that the client starts in a paused state (Time.timeScale = 0) as it will wait for the server to start before establishing a connection.


void Start(){


    ConnectionConfig config = new ConnectionConfig();

    commChannel = config.AddChannel(QosType.Reliable);

    started = true;


    // Maximum default connections = 2

    HostTopology topology = new HostTopology(config, 2);

    if (isServer){

       hostId = NetworkTransport.AddHost(topology, port, null);



        Time.timeScale = 0;

        hostId = NetworkTransport.AddHost(topology, 0);




When the server application starts running it sends the camera position and orientation data for every frame through the network connection to be read by the client. This process takes place in the Update function as implemented below.


void Update ()


    if (!started){



    int recHostId;

    int recConnectionId;

    int recChannelId;

    byte[] recBuffer = new byte[messageSize];

    int recBufferSize = messageSize;

    int recDataSize;

    byte error;

    NetworkEventType networkEvent;

    do {

         networkEvent = NetworkTransport.Receive(out recHostId, out recConnectionId,

                                 out recChannelId, recBuffer, recBufferSize, out recDataSize, out error);



         case NetworkEventType.Nothing:


         case NetworkEventType.ConnectEvent:

             connected = true;

             connectionId = recConnectionId;

             Time.timeScale = 1; //client connected; unpause app.


                 clientId = recHostId;



         case NetworkEventType.DataEvent:



         case NetworkEventType.DisconnectEvent:

             connected = false;


                 Time.timeScale = 0;




    } while(networkEvent!=NetworkEventType.Nothing);

    if (connected && isServer){ //Server



    if (!connected && !isServer){ // Client





In the Update function different types of network events are processed. A soon as the connection is established the client application changes its state from paused to running (Time.timeScale = 1). If a disconnection event takes place then the client is paused again. This will occur, for example, when the device is removed from the Samsung Gear VR headset or when the user just removes the headset and the device detects this and goes into pause mode.


The client application receives the data sent from the server in the NetworkEventType.DataEvent case. The function that reads the data is shown below:


void rcvMssg(byte[] data)


    var coordinates = new float[data.Length / 4];

    Buffer.BlockCopy(data, 0, coordinates, 0, data.Length);

    transform.position = new Vector3 (coordinates[0], coordinates[1], coordinates[2]);

    // To provide a smooth experience on the client, average the change

    // in rotation across the current and last frame

    Quaternion rotation = avgRotationOverFrames (new Quaternion(coordinates [3], coordinates [4],

                                                                                                   coordinates [5], coordinates [6]));

    transform.rotation = rotation;

    lastFrame = rotation;



The interesting point here is that the client doesn’t directly use the data received relative to camera position and orientation. Instead the quaternion that describes the rotation of the camera in the current frame is interpolated with the quaternion of the previous frame to smooth camera rotations and avoid sudden changes if a frame is skipped. The function avgRotationOverFrames performs the quaternion interpolation.


Quaternion avgRotationOverFrames(Quaternion currentFrame)


    return Quaternion.Lerp(lastFrame,currentFrame, 0.5f);



As can be seen in the Update function, the server sends camera data over the network every frame. The implementation of the function send is shown below:


public void send()


     byte error;

     byte[] buffer = new byte[messageSize];

     buffer = createMssg();

     if (isServer){


             NetworkTransport.Send (hostId, connectionId, commChannel, buffer, buffer.Length, out error);


         catch (Exception e){

             Debug.Log("I'm Server error: +++ see below +++");







The function createMssg prepares an array of seven floats; three floats from the camera position coordinates and four floats from the camera quaternion that describes the camera orientation.


byte[] createMssg()


    var coordinates = new float[] { transform.position.x, transform.position.y, transform.position.z,

                                                 transform.rotation.x, transform.rotation.y, transform.rotation.z,


    var data = new byte[coordinates.Length * 4];

    Buffer.BlockCopy(coordinates, 0, data, 0, data.Length);

    return data;



This script is attached to the camera for both server and client applications, for the server the public variable isServer must be checked. Additionally, when building the client application the option Build Settings -> Player Settings -> Other Settings -> “Virtual Reality Supported” must be unchecked as the client application is a non-VR version of the application running on the Samsung Gear VR.


To keep the implementation as simple as possible the server IP address and port are stored in a config file on the client device. When setting up the mirroring system, the first step is to launch the client non-VR application. The client application reads the network data from the config file and enters into a paused state, waiting for the server to start to establish a connection.


Due to time constraints we didn’t devote much time to improving the mirroring implementation described in this blog. We would love to hear any feedback or suggestions for improvement that we can share with other developers.


The picture below shows the mirroring system we use to display what the actual user of the Samsung Gear VR is seeing. Using an HDMI adapter the video signal is output to a big flat panel display in order to share the Samsung Gear VR user experience with others.


Figure 3. Mirroring the Ice Cave VR running on the Samsung Gear VR. The VR server application runs on a Samsung  Galaxy S6 based on the Exynos 7 Octa 7420 SoC

(4x ARM Cortex-A57 + 4x Cortex-A53 and ARM Mali-T760 MP8 GPU). The non-VR client application runs on a Samsung Galaxy Note 4 based on the Exynos 7 Octa 5433 SoC

(4x ARM Cortex-A57 + 4x Cortex-A53 and ARM Mali-T760 MP6 GPU).




The Unity networking API allows an easy and straightforward implementation of mirroring a VR application running on the Samsung Gear VR to a second device running a non VR version of the application. The fact that only the camera position and orientation data are sent every frame guarantees that no additional overload is imposed on either device.


Depending on the application there could be more data to send/receive to synchronize both server and client worlds but the principle to follow will be the same: for every object that needs to sync, send transform data and interpolate them.


The mirroring technique described in this blog will also work in the case of a multiplayer game environment. The name of server/client roles could potentially be swapped depending on the type of mirroring setup. We could have several VR headsets and a single screen, or several screens for a single VR headset or even several of each. Again, every device running on the Samsung Gear VR will send the sync data to one or more devices that share a view to a big screen panel. Each mirroring application has to instantiate every player connected to it, update the transforms of all synced objects following the same receipt and display a single camera view. This could be a view from any of the players or any other suitable view. Sending additional data to keep the mirroring worlds synced shouldn’t have a significant impact on the performance as the amount of info that needs updating per object is really minimal.

Virtual reality is a hot topic for mobile devices. 2015 was the year of the rise of mobile VR with Samsung and Oculus releasing the Galaxy Gear VR headset for the Samsung Galaxy Note 4 and Note 5, as well as Galaxy S6 smartphones. Around the same time, Google launched their Google Cardboard VR headsets and the trend grew with a myriad of other players releasing Head Mounted Displays (HMD) for VR. From Asia with Deepoon’s all-in-one headsets and VIRGLASS headsets with a smartphone tray, to Western companies like Carl Zeiss and their VR One HMD.


Also during 2015 Unity, the games and graphics engine most used by developers, added native support for Samsung Gear VR as well as for other VR/AR hardware using third party plug-ins.


At ARM®, in order to help our partner ecosystem to flourish in that area, we released our first VR SDK v0.1 alpha release during the summer of 2015. This March at GDC we launched our v1.0 publically to developers and our OEM and SiP partners.


Mali VR SDK.png


The ARM Mali™ VR SDK is based on Android and OpenGL ES at this stage and includes sample code and libraries for VR developers. It’s applicable to everyone from VR application developers to HMD designers who would like to achieve the lowest latency, highest performance and minimal battery consumption on any ARM Mali based mobile platform.


To get your environment set up you need the Android Studio SDK and the Android NDK. The developer can either render the samples on any Android device or they can calibrate the sample for specific HMD they are designing. The samples cover everything from VR basics and the fundamentals of stereoscopy, to how best to use the Multiview extension and implementing Multi-Sampling.


You can download the VR SDK from our ARM Mali Developer Center, and standby for the future updates with all the new VR extensions on Android on their way!

In my previous blog post I explained some of the key concepts of Vulkan and how we implemented them in our internal graphics engine. In this post I will go into a bit more detail about how we implemented multi-threading and some of the caveats to watch out for.


Quick background

Vulkan was created from the ground up to be thread-friendly and there's a huge amount of details in the spec relating to thread-safety and the consequences of function calls. In OpenGL, for instance, the driver might have a number of background threads working while waiting for API calls from the application. In Vulkan, this responsibility has moved up to the application level, so it's now up to you to ensure correct and efficient multi-threading behavior. This is a good thing since the application often has better visibility of what it wants to achieve.


Command pools

In Vulkan command buffers are allocated from command pools. Typically you pin command pools to a thread and only use this thread when writing to command buffers allocated from its command pool. Otherwise you need to externally synchronize access between the command buffer and the command pool which adds overhead.


For graphics use-cases you also typically pin a command pool per frame. This has the nice side-effect that you can simply reset the entire command pool once the work for the frame is completed. You can also reset individual command buffers, but it's often more efficient to just reset the entire command pool.


Coordinating work

In OpenGL, work is executed implicitly behind the scenes. In Vulkan this is explicit where the application submits command buffers to queues for execution.



Vulkan has the following synchronization primitives:

  • Semaphores - used to synchronize work across queues or across coarse-grained submissions to a single queue
  • Events and barriers - used to synchronize work within a command buffer or a sequence of command buffers submitted to a single queue
  • Fences - used to synchronize work between the device and the host


Queues have simple sync primitives for ordering the execution of command buffers. You can basically tell the driver to wait for a specific event before processing the submitted work and you can also get a signal for when the submitted work is completed. This synchronization is really important when it comes to submitting and synchronizing work to the swap chain. The following diagram shows how work can be recorded and submitted to the device queue for execution before we finally tell the device to present our frame to the display.


In the above sequence there is no overlap of work between different frames. Therefore, even though we're recording work to command buffers in multiple threads, we still have a certain amount of time where the CPU threads sit idle waiting for a signal in order to start work on the next frame.




This is much better. Here we start recording work for the next frame immediately after submitting the current frame to the device queue. All synchronization here is done using semaphores. vkAcquireNextImageKHR will signal a semaphore once the swap chain image is ready, vkQueueSubmit will wait for this semaphore before processing any of the commands and will signal another semaphore once the submitted commands are completed. Finally, vkQueuePresentKHR will present the image to the display, but it will wait for the signaled semaphore from vkQueueSubmit before doing so.



In this blog post I have given a brief overview of how to get overlap between CPU threads that record commands into command buffers over multiple frames. For our own internal implementation we found this really useful as it allowed us to start preparing work for the next frame very early on, ensuring the GPU is kept busy.

Unity is a multi-platform game development engine used by the majority of game developers. It enables you to create and distribute 2D and 3D games and other graphics applications.



At ARM, we care about game developers. We know we can now achieve console quality games on mobile platforms and we therefore compiled the “ARM Guide for Unity Developers”, a compilation of best practises and optimized techniques to get the most from an ARM mobile platform. Whether you are anything from a beginner to an advanced Unity user, you will find the advice you need to increase the FPS in your graphics app.


Optimization Process

The guide starts by covering the optimization process, so that developers learn the optimal quality settings and the fundamentals of the optimization process. It showcases how to use the Unity Profiler and Debugger as well as the ARM developer tools (Mali™ Graphics Debugger and Streamline).


The profiler is used as a first step, to take measurements of the graphics application and analyze the data to locate any code bottlenecks. Then, we determine the relevant optimization to apply, and finally the developer needs to verify that the optimization works.


The guide dedicates a whole sub-chapter to another very useful ARM tool for Unity developers; the Mali Offline Shader Compiler, which enables developers to compile vertex, fragment and compute shaders into a binary form. Also, it provides information about the number of cycles the shaders are required to execute in each pipeline of the Mali GPU, so that developers can analyze and optimize for ARM Mali GPUs.



The optimizations chapter includes everything from ARM Cortex application processor optimizations with code snippets and settings examples, to ARM Mali GPU optimizations as well as asset optimizations.


The ARM Mali GPU optimization techniques include:


  • The use of static batching, a common optimization technique that reduces the number of draw calls therefore reducing the application processor utilization.
  • The use of 4 x MSAA, ARM Mali GPUs can implement 4x multi-sample anti-aliasing (MSAA) with very low computational overhead.


LOD group settings


  • Level of Detail (LOD), a technique where the Unity engine renders different meshes for the same object depending on the distance from the camera.
  • The use of lightmaps and light probes. Lightmaps pre-compute the lighting calculations and bake them into a texture called a lightmap. This means developers lose the flexibility of a fully dynamically lit environment, but they do get very high quality images without impacting performance. On the other hand, the use of light probes enables developers to add some dynamic lighting to light-mapped scenes. The more probes there are, the more accurate the lighting is.
  • ASTC Texture Compression is the most efficient and flexible texture compression format available for Unity developers. It provides high quality, low bitrate and many control options, which are explained in detail.
  • Mipmapping technique  to enhance visual quality as well as the performance of the graphics application. Mipmaps are pre-calculated versions of a texture at different sizes. Each texture generated is called a level and it is half as wide and half as high as the preceding one. Unity can automatically generate the complete set of levels from the 1st level at the original size down to a 1x1 pixel version.
  • Skypboxes as a means to draw the background of the camera using a single cubemap, requiring only a single cubemap texture and one draw call.
  • How to implement efficient real-time shadows in Unity. Unity supports transform feedback for calculating real-time shadows and for advanced developers the guide shows how to implement custom shadows based on a very efficient technique using local cubemaps in the “Advanced Graphics Techniques” chapter.
  • Occlusion Culling consists of not rendering the objects when they are not in line of view from the camera, thereby saving GPU processing power.
  • How to efficiently use OnBecameVisible() and OnBecomeInvisible() callbacks
  • Rendering Order is very important for performance. The most optimal way is to render opaque objects front-to-back, helping reducing overdraw. Developers can learn what latest hardware techniques are also available to reduce overdraw, like early-Z and Pixel Forward Kill (PFK), as well as what the options provided by the Unity engine.


Developers can optimize their application further by using asset optimizations and a whole sub-chapter addresses this, covering how to most effectively prepare textures and texture atlases, meshes and animations.



The Unity engine supports Global Illumination (GI) using Enlighten from v5 onwards. Enlighten is the ARM Geomerics real-time GI solution.


Enlighten in Unity can be used for baking light maps, light probes and for real-time, indirect lighting. The Enlighten components are not explicitly exposed in Unity, but they are referenced in the user interface and the guide therefore also explains what they are and how they work together.


The Enlighten section also explains how to configure Enlighten in custom shaders, the code flow and what developers need to do to set up Enlighten in the vertex and fragment shader code. It showcases a version of the Unity Standard Shader that is modified to include directional global illumination.



Enlighten Lightmap Images: above left - Ice Cave demo, above right –

its UV Chart lightmap, below left – its Irradiance lightmap, below right – its directionality lightmap



Advanced Graphics Techniques

Chapter 6, the longest chapter of the guide, explains Advanced Graphics Techniques. These techniques are mainly implemented using “Custom Shaders” as the Unity source code of built-in shaders does not include the majority of advanced effects. The chapter starts by describing how to write and debug custom shaders and then goes on to explain how to implement advanced graphics techniques used in the Ice Cave and Chess Room demos. It also shows source code snippets:


  • Reflections with a local cubemap: this technique is implemented in Unity v5 and higher with reflection probes. You can combine these reflections probes with other types of reflections, such as reflections rendered at runtime with your own custom shader.
  • Combining static reflections based on local cubemaps with dynamically generated reflections


Combining Different Types of Reflections


  • Dynamic soft shadows in a game scene there are moving objects and static environments such as rooms. Using dynamic soft shadows based on the local cubemap technique, developers can use a texture to represent the shadows and the alpha channel to represent the amount of light entering the room.Refractions.png
  • Refraction based on local cubemaps – another lighting effect using the highly optimized local cubemap technique. Developers can combine the refractions with reflections at runtime.
  • Specular effects using the very efficient Blinn technique. In the example provided from the Ice Cave demo, the alpha channel is used to determine the specular intensity, thus ensuring that the specular effect is applied only to surfaces that are lit.
  • Using Early-z to improve performance by removing overdrawn fragments.
  • Dirty lens effect – this effect invokes a sense of drama and is often used together with a lens flare effect. This can be implemented in a very light and simple way which is suitable for mobile devices.
  • Light shafts - they simulate the effect of crepuscular rays, atmospheric scattering or shadowing. They add depth and realism to a scene. This effect is based on truncated cone geometry and a script that uses the position of the sun to calculate the magnitude of the lower cross-section cone expansion, and the direction and magnitude of the cross-section shift.
  • Fog effects – they add atmosphere to a scene. There are two versions of the fog effect: procedural linear fog and particle-based fog.Light Shafts.png
  • Bloom – bloom reproduces the effects that occur in real cameras when taking pictures in a bright environment. This effect is noticeable under intense lighting and the guide demonstrates this effect implemented in a very efficient way by using a simple plane approach.
  • Icy wall effect - ice is a difficult material to replicate because light scatters off it in different ways depending on the small details of its surface. The reflection can be completely clear, completely distorted, or anywhere in between. In the Ice Cave demo, this effect includes a parallax effect for greater realism.
  • Procedural skybox -  to achieve a dynamic time of day effect, the following elements were combined in the Ice Cave demo skybox: a procedurally generated sun, a series of fading skybox background cubemaps that represent the day to night cycle, and a skybox clouds cubemap.
  • Fireflies – they are bright flying insects that are used in the Ice Cave demo to add more dynamism and show the benefits of using Enlighten for real-time global illumination.


Mobile Virtual Reality

Last but not least, the last chapter of the guide covers the best coding practises when developing graphics applications for Mobile Virtual Reality.


Unity natively supports some VR devices like the Samsung Gear VR and plug-ins can enable support of other devices like the Google Cardboard. The guide describes how to port a graphics application onto native Unity VR.


Screenshot from a VR application running in Samsung Gear VR developer mode


VR creates a more immersive user experienced compared to running the graphics application from your smartphone or tablet and therefore, camera animations might not feel comfortable for the user in VR. Also, VR can benefit from controllers that connect to the VR device using Bluetooth. Tips and methods to create the ultimate user experience are described in the guide.


A whole sub-chapter is dedicated to how to implement reflections in VR. They can use the same local cubemap technique explained earlier in the Advanced Graphics Techniques chapter. However, the technique must be modified to work with the stereo visual output that a user sees. This chapter therefore explains how to implement stereo reflections as well as combining different types of reflections.


We welcome the feedback on our ARM Guide for Unity Developers, which we keep updating on a regular basis and the document history is on our Mali Developer Center.

Arguably the most high profile Mali powered device to hit the hands of customers this year is the Samsung Galaxy S7. Launched at Mobile World Congress 2016 in Barcelona, the Galaxy S7 represents Samsung’s latest offering to the premium mobile market. Similar in design to the S6, the S7 and S7 Edge strike a balance of sleek elegance and super sturdiness. One of the key features Samsung are talking about is the fact that there is an efficiently water-proofed smartphone! For those of us with a tendency to drop our phones in sinks, puddles and who knows what else, this is pretty big news in itself!


For the graphics geeks among us though, the incredible graphics and clarity and depth of colour are a real attraction for Samsung’s latest offering. The chipset is the Exynos 8 Octa (8890) and features four CPU cores based on a 64-bit ARMv8 architecture. ARM’s big.LITTLE technology is utilized to its full advantage to strike the perfect balance between super high performance and premium power efficiency. The complex user interface and incredible graphics capability is powered by a Mali-T880MP12 GPU configuration, the most powerful Mali GPU on the market.


So why is Mali the GPU of choice to power high end devices? Simple, Mali’s the number 1 GPU in the world! Great leaps in energy efficiency come from the built in bandwidth saving technologies like ARM Frame Buffer Compression (AFBC), Smart Composition and Transaction elimination and make it the perfect choice for the latest high end devices. One of the reasons there’s such focus on performance and efficiency is the rise of the VR industry which is powering ahead at an unforeseen rate. Obviously VR requires fantastic graphics, when your eyes are just centimeters from a mobile screen the image needs to be spectacular, but another smart choice Samsung have made in this area is their display.


The AMOLED display works differently from a traditional LCD display in that each and every pixel is individually lit and adjusted by the amount of power travelling through the film behind it. This means that unlike LCD displays where there is a permanent backlight; Samsung’s AMOLED display allows you to completely turn off sections of the screen. Not only does this allow you to achieve a deeper, truer black than on an LCD display but it also means that in VR applications you can light only the part of the screen that is showing the correct view based on the user’s head position. This allows faster adjustment to the updated head positioning, lowering latency and providing a sharper, more immersive VR experience than is available on LCD displays. As Samsung are ahead of the VR game with the Oculus collaborated Samsung Gear VR headset, this is an important factor in staying ahead of the game.


With incredible Mali based visuals, superior battery life and a fantastic user interface, the Samsung S7 represents another step up for Android devices and we look forward to seeing what comes next.


If you have been following Vulkan lately, you will have heard about SPIR-V, the new shading language format used in Vulkan.

We decided early on to standardize our internal engine on SPIR-V,

as we needed a way to cleanly support both our OpenGL ES backend as well as Vulkan without modifying shaders.


From our experience, having #ifdefs in a shader which depend on the graphics API you are targeting is not maintainable.

Our old system was based on text replacement which became more and more unmaintainable as new API features emerged.

We wanted something more robust and having a standard IR format in SPIR-V was the final piece of the puzzle to enable this.


Using our engine, we developed a demo showcasing Vulkan at GDC, please see my colleague's blog post for more information on the topic.

Porting a Graphics Engine to the Vulkan API


Compiling down to SPIR-V


SPIR-V is a binary, intermediate shading language format. This is great because it means that you no longer have to deal with vendor-specific issues in the GLSL frontend of the driver.

The flipside of this however is that you now have to consider how to compile a high level shading language down to SPIR-V.

The best alternatives for this currently is using Khronos' glslang library or Google's shaderc

These tools can compile GLSL down to SPIR-V which you can then use in Vulkan.

The optimal place to do this compilation is during your build system so that you don't have to ship shader source in your app.


We're close then, at least in theory, to having a unified system for GLES and Vulkan.


One language, many dialects


While our intention to use GLSL as our high level language makes sense when targeting GL/Vulkan, there are problems, one of which is that

there are at least 5 main dialects of GLSL:


  • Modern mobile (GLES3+)
  • Legacy mobile (GLES2)
  • Modern desktop (GL3+)
  • Legacy desktop (GL2)
  • Vulkan GLSL


Vulkan GLSL is a new entry to the list of GLSL variants. The reason for this is that we need a way to map GLSL to newer features found in Vulkan:


Vulkan GLSL adds some incompatibilities with all the other GLSL variants, for example:

  • Descriptor sets, no such concept in OpenGL
  • Push constants, no such concept in OpenGL, but very similar to "uniform vec4 UsuallyRegisterMappedUniform;"
  • Subpass input attachments, maps well to Pixel Local Storage Pixel Local Storage on ARM® Mali™ GPUs
  • gl_InstanceIndex vs gl_InstanceID. Same, but Vulkan GLSL's version InstanceIndex (finally!) adds base instance offset for you


This makes it problematic to write GLSL that can work in both GL and Vulkan at the same time

and whilst it is always possible to use #ifdef VULKAN, this is a road we don't want to go down.

As you might expect from the blog title, we solved this with SPIRV-Cross, but more on that later in the post


Shader reflection


If you're starting out with simple applications in Vulkan, you don't have to deal with this topic quite yet, but once your engine starts to improve, you will very soon run into a fundamental difference between OpenGL ES/GLSL and Vulkan/SPIR-V. It is generally very useful to be able to query meta-data about the shader file you are working with. This is especially important in Vulkan since your pipeline creation very much depends on information that is found inside your shaders. Vulkan, being a very explicit API expects you as the API user to know this information up front and expects you to provide a VkPipelineLayout that describes which types of resources are used inside your shader.

Kind of like a function prototype for your shader.


VkGraphicsPipelineCreateInfo pipeline = {
     .layout = pipelineLayout,

The pipeline layout describes which descriptor sets you are using as well as push constants. This serves as the "function prototype" for your shader.


VkPipelineLayoutCreateInfo layout = {
     .setLayoutCount = NELEMS(setLayouts),
     .pSetLayouts = setLayouts,

Inside the set layouts is where you describe which resources you are using, for example


// For first descriptor set
VkDescriptorSetLayoutBinding bindings[] = {
          .binding = 0,
          .descriptorCount = 1,
          .stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT
          .binding = 2,
          .descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
          .descriptorCount = 1,
          .stageFlags = VK_SHADER_STAGE_VERTEX_BIT

You might ask how you would know this information before you have compiled your pipeline, but Vulkan does not provide an API for this because this is vendor-neutral code that shouldn't need to be implemented the same way N times over by all vendors.


In your simple applications you probably know this information up front. After all, you wrote the shader, so you can fill in this layout information by hand, and

you probably just have one or two resources anyway, so it's not that big a deal.

However, start to consider more realistic shaders in a complex application and you soon realize that you need a better solution.


In GLES, the driver provides us with reflection to some extent, for example:


GLint location = glGetUniformLocation(program, "myUniform");
glUniform4fv(location, 1, uniformData);

GLint attrib = glGetAttribLocation(program, "TexCoords");





To solve the problems of reflection and Vulkan GLSL/GLSL differences, we developed a tool and library, SPIRV-Cross, that will be hosted by Khronos and you can find it on shortly.

This tool was originally published on our Github as spir2cross, but we have donated the tool to Khronos Group and future development will happen there.


The primary focus of the library is to provide a comprehensive reflection API as well as supporting translating SPIR-V back to high level shader languages.

This allowed us to design our entire engine around Vulkan GLSL and SPIR-V while still supporting GLES and desktop GL.


Shader toolchain


We wanted to design our pipeline so that the Vulkan path was as straight forward as we could make it, and we could deal with how to get back to GL/GLES while still being robust and sensible.

We found that it is much simpler to deal with GL specifics when disassembling from SPIR-V, it is not trivial to meaningfully modify SPIR-V in its raw binary format.

It therefore made sense to write all our shader code targeting Vulkan, and deal with GL semantics later.



In Vulkan, as we write in Vulkan GLSL, we simply use glslang to compile our sources down to SPIR-V. This shader source can then be directly given to the Vulkan driver.

We still need reflection however, as we need to build pipeline layouts.


#include "spirv_cross.hpp"
     // Read SPIR-V from disk or similar.
     std::vector<uint32_t> spirv_binary = load_spirv_file();
     spirv_cross::Compiler comp(std::move(spirv_binary));
     // The SPIR-V is now parsed, and we can perform reflection on it.
     spirv_cross::ShaderResources resources = comp.get_shader_resources();

     // Get all sampled images in the shader.
     for (auto &resource : resources.sampled_images)
          unsigned set = comp.get_decoration(, spv::DecorationDescriptorSet);
          unsigned binding = comp.get_decoration(, spv::DecorationBinding);
          add_sampled_image_to_layout(set, binding);
     // And so on for other resource types.

We also need to figure out if we are using push constants. We can get reflection information about all push constant variables which are actually in use per stage, and hence compute the range which should be part of the pipeline layout.


spirv_cross::BufferRanges ranges = compiler.get_active_buffer_ranges(resources.push_constant_buffers.front().id);


From this, we can easily build our push constant ranges.



In GLES, things are slightly more involved. Since our shader sources are in Vulkan GLSL we need to make some tranformations before converting back to GLSL.

The GLES backend consumes SPIR-V, so we still do not have to compile shader sources in runtime. From that, we perform the same kind of reflection as in Vulkan.


Push Constants

In SPIRV-Cross, push constants are implemented as uniform structs, which map very closely to push constants:


layout(push_constant, std430) uniform VulkanPushConstant
     mat4 MVP;
     vec4 MaterialData;
} registerMapped;



struct VulkanPushConstant
     mat4 MVP;
     vec4 MaterialData;
uniform VulkanPushConstant registerMapped;

in GLSL.

Using the reflection API for push constants we can then build a list of glUniform calls which will implement push constants for us.


Resource binding

We also need to remap descriptor sets and bindings. OpenGL has a linear binding space which is sorted per type,

e.g. binding = 0 for uniform buffers and binding = 0 for sampled images are two different binding points, but this is not the case in Vulkan.

We chose a simple scheme which allocates linear binding space from the descriptor set layouts.


Let's say we have a vertex and fragment shader which collectively use these bindings.


  • uniform (set = 0, binding = 1)
  • texture (set = 0, binding = 2)
  • texture (set = 0, binding = 3)
  • uniform (set  = 1, binding = 0)
  • texture (set = 1, binding = 1)


In set 0, the range of uniform bindings used are [1, 1]. Textures are [2, 3]. We allocate the first binding in uniform buffer space to set 0, with binding offset -1.

To remap set/binding to linear binding space, it's a simple lookup.


linearBinding = SetsLayout[set].uniformBufferOffset + binding

textureBinding = SetsLayout[set].texturesOffset + binding



For set 1 for example, binding = 1 would be mapped to binding = 2 since set 0 consumed the two first bindings for textures. Similarly, for uniforms, the uniform buffer in set 1 would be mapped to binding = 1.

Before compilation back to GLSL we strip off all binding information using the SPIRV-Cross reflection API. After we have linked our GL program we can bind the resources to their new correct binding points.


Using this scheme we managed to use Vulkan GLSL and a Vulkan-style API in GLES without too many complications.


Pixel Local Storage

Our old system supported Pixel Local Storage, and we did not want to lose that by going to Vulkan GLSL so

we use the SPIRV-Cross PLS API to convert subpass inputs to PLS inputs and output blocks.


vector<PlsRemap> inputs;
vector<PlsRemap> outputs;
// Using reflection API here, can have some magic variable names that will be used for PLS in GLES and subpass inputs in Vulkan.
compiler.remap_pixel_local_storage(move(inputs), move(outputs));


Hopefully this blog entry gave you some insights into how you can integrate better with SPIR-V in your Vulkan apps.

Filter Blog

By date:
By tag:

More Like This