1 2 3 Previous Next

ARM Mali Graphics

297 posts

Hello all,


My name is Dale Whinham, and I’m an intern within the Media Processing Group at ARM. I have been working with ARM over the summer to produce some additional sample code for the Mali SDK, which is freely downloadable for your platform of choice over at our SDKs section here: http://malideveloper.arm.com/resources/sdks/


In this blog post, I wanted to talk a little bit about my experiences with the Mali OpenGL ES Emulator, which saw some significant updates recently, as detailed by Lorenzo Dal Col in this previous blog post: http://community.arm.com/groups/arm-mali-graphics/blog/2015/04/10/whats-new-in-mali-graphics-debugger-21-and-opengl-es-emulator-21


I am very new to OpenGL (ES) development, so currently I rely fairly heavily on good debugging tools to help me understand where I might be going wrong. As a newcomer to graphics development, I learned fairly quickly that staring at a black screen for several hours is normal and okay when you are just starting out, as there are quite a lot of things you need to do to prepare your OpenGL context for rendering, such as allocating vertex buffer objects, filling them with data, compiling shaders, getting handles to their attributes and uniforms, and so on. This in itself can be quite overwhelming at first, especially when it’s difficult to see what’s going on inside OpenGL, because you’re cross-compiling native code for an Android device, and debugging options are more limited.


One of the challenges I was faced with was that I struggled to get debugging to work properly for native code in Eclipse with Android Development Tools (ADT).


So, a bit of context: at the time of writing, Google have now deprecated support for Eclipse with ADT, in favour of their new Android Studio IDE – which is great news for Android Java developers, as the IntelliJ platform makes for a very rich and stable IDE, but not-so-great news for developers writing C/C++ code, as Android Studio’s native code support is still in its early stages at the time of writing. Until the tools mature, Eclipse with ADT is still relied on by many developers to write and build native code for Android.


As such, I just couldn’t get the Eclipse debugger to co-operate and set breakpoints on my native Android code. I found myself spending more time on StackOverflow trying to find solutions to problems with Eclipse than I did actually writing the code, and so I started looking for another strategy!


I was made aware of the Mali OpenGL ES Emulator, which is a comprehensive wrapper library for OpenGL that allows you to write code for the OpenGL ES API, but have it run on a desktop computer with desktop OpenGL. This would allow me to work on my project on the desktop, get it working the way I wanted, and then move it back into Eclipse and rebuild for Android later. The Mali Linux SDK actually comes with Microsoft Visual Studio project files, and you can build and run the samples for Windows if you have the Mali OpenGL ES Emulator installed. I decided to migrate my project to Visual Studio 2015 so that I could design and debug it on the desktop more easily, though I could have also chosen to use Linux-based tools, as the Mali OpenGL ES Emulator provides a Linux version too.


The installation procedure is quite straightforward. There are two flavours of the Mali OpenGL ES Emulator to download – 32bit or 64bit – and you’ll need to install the version corresponding to your build target, i.e. whether you’re compiling for 32bit or 64bit. You can, of course, install both if you’re building for both architectures, but beware of mixing up the ”bitness” – if your app is compiled for 64bit but tries to load the 32bit emulator DLLs, it may crash.


Once installed, configure your project to search for headers within the “include” directory inside the Mali OpenGL ES Emulator’s installation folder – e.g. for the 64bit version on Windows, for me it was C:\Program Files\ARM\Mali Developer Tools\Mali OpenGL ES Emulator 2.2.1\include (see Figure 1). This folder contains header files for EGL and OpenGL ES 2 and 3 as well as their extensions.

fig1.pngFigure 1: Setting additional include directories in Visual Studio. Note the semicolon is used to add multiple directories.


Additionally, configure your project to add the installation folder to your list of linker search directories, so it can link against the wrapper libraries (see Figure 2):


Figure 2: Setting additional library directories in Visual Studio.

Once you’ve done this, you’re pretty much ready to go. On Windows, the Mali OpenGL ES Emulator installer sets your system’s PATH environment variables so that your compiled application will find the OpenGL ES libraries correctly at runtime. You can now begin writing code as if it were for a mobile GPU by including the OpenGL ES headers in your source code, and calling OpenGL ES functions as normal.

Figure 3 shows a screenshot of the Mali OpenGL ES emulator in action, showing a simple 3D scene from one of my work-in-progress code samples. The code sample has some glue code to give me a Windows GUI window, but the rendering context and draw calls are all EGL and OpenGL ES – wrapped seamlessly to desktop OpenGL by the Mali OpenGL ES Emulator:


Figure 3: A simple 3D scene being rendered in OpenGL ES using the Mali OpenGL ES Emulator

In addition to being able to use the powerful Visual Studio debugger for my C++ code, a major benefit of the OpenGL ES Emulator is that I can stack desktop OpenGL debuggers on top of it.

For instance, what if I wanted to check the geometry of my 3D models with a wireframe view? Well, in desktop OpenGL I could just use glPolygonMode() with GL_LINE as the mode parameter, but in OpenGL ES we don’t have this function and so we would have to write a shader.

Alternatively I could use the force wireframe feature of an OpenGL debugger. Enter GLIntercept (https://github.com/dtrebilco/glintercept), a powerful open-source OpenGL debugger for Windows that comes with a multitude of features, including (but not limited to) run-time shader editing, the ability to freely move the camera, texture/framebuffer dumping, and wireframe rendering. By placing its special OpenGL32.dll into the same directory as our application executable, along with a configuration file that lets us pick the debugging features we’d like to enable, it intercepts all calls to OpenGL, allowing us to tweak the behaviour of OpenGL before it gets forwarded to the GPU driver.

In Figure 4, we can see that same scene again, but with GLIntercept enabled, forcing wireframe on, and allowing me to see the geometry of my 3D objects without having to change the code of my project:


Figure 4: The same 3D scene using the wireframe debugging feature of GLIntercept

This is just the tip of the iceberg of what is possible with the Mali OpenGL ES emulator. It supports many OpenGL extensions such as ASTC texture compression, and extensions from the Android Extension Pack – a collection of extensions found in Android 5.0+ that gives you many advanced rendering capabilities, making it a powerful addition to your development tools. With a reasonably-specced graphics card in your PC, you can save a lot of time developing applications that use these features by eliminating the process of loading your code onto a development device from your workflow – at least in the earlier stages of development, when recompiling and testing may be quite frequent.

For more information about the Mali OpenGL ES Emulator, check out our product page over here: http://malideveloper.arm.com/resources/tools/opengl-es-emulator/


Make sure you grab the PDF User Guide from the download link too, for a comprehensive manual that gives you all the technical details about the emulator, including system requirements for the host PC, supported extensions, texture formats and much more!

Achieving the icy wall effect in the Ice Cave demo

Ice Cave is ARM’s latest demo. It shows that great graphical quality can be obtained in mobile devices with the use of conventional highly optimised rendering techniques. The demo is visually stunning and full of life: there are reflections on the board, refractions within the phoenix statue and there is a feeling of time passing as the light changes with the sun moving, coupled with the patrolling tiger and the fluttering butterfly. All these elements are immediately evident, but there are a few others that are more subtle but add another layer of dynamism to the demo, such as the reflective icy walls of the cave. The purpose of this blog is to explain how we achieved this effect within the Ice Cave in a way that is easy to understand for everyone, in order for developers to be able to replicate the technique themselves, as this is a very performance-efficient technique that works wells in mobile.


  1. Fig. 1: The Ice Cave [effect in the video can seen at 3:00 or by clicking here: https://youtu.be/gsyBMHJVhXA?t=180]

Ice is a tricky material to replicate due to its reflective and refractive properties. The way it refracts light gives it a particular hue of blue that is a bit difficult to pinpoint and texture effectively without having the asset appear excessively blue or completely washed out. Light scatters off ice in a certain way depending on the surface of the ice itself, which could mean the reflections on the surface can be anything from relatively clean to completely distorted and unrecognisable. On an environment such as the Ice Cave, one would expect to get uneven, relatively unclean reflections on the walls due to their irregular nature, and if you look closely when panning around the demo, you can see it is there. This effect is the result of a long effort in investigating and trying to achieve a parallax effect that made the ice appear thick and reflective.

I originally wanted a parallax effect on the walls, but previous attempts had failed to produce the kind of effect that we were after. The idea for the technique that is currently found in the Ice Cave originated after an accidental switch between the world space normals and tangent space normals. I noticed that the switch resulted in a strange, distorted effect on the walls that was highly reflective. These sorts of reflections were the type that we were after, but we needed to have control over them. Using that thought as an inspiration, I started looking into how to localise that reflective effect only to certain areas of the cave walls in order to get the parallax effect we were after.


Ice Cave_wall2.png

  1. Fig. 2: Close up of the reflective icy walls [effect in the video can seen at 3:16 or by clicking here: https://youtu.be/gsyBMHJVhXA?t=196]

The initial map used to produce the reflections was an object space normal map with sections that were greyed out, and as such contained no normal information (Fig. 3). Even though it worked, it was a complicated process to tweak the normal map information as it had to be done in Photoshop by manually adding and removing sections of the texture as we needed. That was when I had the first thought of using two separate maps in order to interpolate between them to obtain the reflections.


modified normal map1.png

  1. Fig. 3: The first modified normal map

The two main elements of the polished technique are the object space normal maps and the greyscale normal maps (Fig, 4). The white areas of the grey map remain unaffected and as such take on the information provided by the object space normal maps. It is the combination of the two which produces the icy, parallax-like reflections on the walls of the cave.


modified normal map paired.png


  1. Fig. 4: Object space normals on the left and final greyscale normals on the right

The greyscale normals are made by removing the colour from the tangent space normal maps (fig. 5). This produces as a result a picture with small tone variation in the grey, in such a way that most of the values are in the range of 0.3 - 0.8.


Tangent space normals paired.png


  1. Fig. 5: Tangent space normals and resulting greyscale normals

It is important to use the tangent space normal maps in order to produce a greyscale map, as the colour variation is minimal in them which will mean that, once the colour is removed, you will be left with a map that very clearly shows all the rough detail of the surface. On the other hand, if you use of the object space normal maps you will get an image that shows where the light hits as well as the rough detail due to the contrasting colours. (Fig. 6)


modified normal map Grey bk Greyscale paired.png


  1. Fig. 6: Object space normals to the left and the resulting greyscale normals to the right,
    on which the areas where the light hits is very evident

The grey normals are to only cause reflection on the walls of the cave, not on the snow. Therefore both the diffuse map and the greyscale normal map have to match, so that wherever there is white in the diffuse map the grey normal map is transparent, and where there is black in the diffuse map the grey normal map is opaque (fig. 7).


modified normal map Diffuse texture paired.png


  1. Fig. 7: Diffuse texture on the left and final greyscale normal maps. The transparent areas on the normal map match the black ice areas on the diffuse texture.

The grey normals are then combined with the true normals using a value proportional to the transparency value of the greyscale normals:

half4 bumpNormalGrey= lerp(bumpNorm, bumpGrey, amountGreyNormalMap);

As a result, in the dark, rocky parts of the cave, the main contribution will come from the greyscale normals and in the snowy part from the object space normals, which produces the effect we are looking for.

At this point the normals all have the same components with values between 0.3-0.8. It means the normals are pointing in the direction of the bisect of the first octant, as the normals have the same component values: (0.3, 0.3, 0.3), … , (0.8, 0.8, 0.8)

The shader then applies a transformation that you normally use to transform values in the interval [0, 1] to the interval [-1, 1]: 2 * value – 1. After applying this transformation part of the resulting normals will point to the bisect of the first octant and the other part to the opposite direction.

If the original normal has the components (0.3, 0.3, 0.3) then the resulting normal will have the value (-0.4, -0.4, -0.4). If the original normal has the components (0.8, 0.8, 0.8) then the resulting normal will have the value (0.6, 0.6, 0.6). So now the normals are pointing to two main opposite directions. Additionally when the reflection vector is calculated the built in reflect function is used. This function expects the normal vector to be normalized but what we are passing is a non-normalized normal.

As the normals are not normalized the length is less than 1. The reflect function is defined as:

R = reflect(I, N) = I – 2 * dot(I, N) * N

When you use the built in reflect function with a non-normalized normal with a length less than 1 the resulting reflection vector will have an angle relative to the provided normal higher than the incident angle.

The normal switching direction when the greyscale map has values around 0.5 means that totally different parts of the cube map will be read whenever the greyscale value changes between lower than, or higher than 0.5, creating the effect where there are uneven spots reflecting the white parts of the cube map right next to where it reflects the rocky parts. Since the areas of the greyscale map that gives switches between positive and negative normals is also the area that gives the most distorted angles on the reflected vectors, this gives makes the white reflected spots very uneven and distorted, giving the desired “swirly” effect.

If the shader outputs only the static reflection using non-normalized and clamped grey normals we get an effect like the one shown below in figure 8:


Ice Cave Swirl Effect.png

  1. Fig. 8: This is the swirl effect that results in the icy effect that everybody likes

The clamping seems to be relevant to the result of the icy effect as it produces normals oriented mainly in two opposite directions, which is the main factor that defines the swirl-like pattern. However if we remove the clamping of the greyscale normals, the result produces normals in one main direction then we get the effect shown in figure 9, which results in a different visual effect:


Ice Cave Slanted band pattern.png

  1. Fig. 9: The removal of the clamp results in a slanted band pattern which is much more evident when the camera moves

The use of two normal maps is not the only thing that influences the reflections on the icy walls. Reflections in the Ice Cave are obtained through the use of local cubemaps, which is a very effective and low-cost way of implementing reflections in a scene. These cubemaps have local corrections applied to them, which ensure that reflections behave in a realistic way and changes as expected as one looks around the cave. Local correction is needed because the cave is a limited environment, which means the reflections inside of it should behave differently than those caused by an infinite skybox. Having the local correction makes the effect appear realistic; without it the reflections remain static and give the illusion that the geometry is sliding over the image instead of the reflections being within the geometry. There is no feeling of depth or feeling that the ice is thick without it.

More information as to how local cubemaps work can be found in this blog: http://community.arm.com/groups/arm-mali-graphics/blog/2014/08/07/reflections-based-on-local-cubemaps

It was an interesting journey to try and understand the workings behind this effect, as it was achieved through relatively unconventional means. The in-depth study helped us understand how the two normals behave together, and what exactly causes the result displayed in the demo.  The icy wall reflections are very effective, inexpensive and perform as desired: making the cave seem as it is actually made of thick ice.

When creating an animation, it is paramount to have a very clear objective and vision of the asset and its function. How much is the object going to deform? How visible is it going to be in the scene? How complex is the overall animation? These are some questions to ask as you are making the asset. Outlining and understanding important aspects of the task can be the difference between smooth progress and a difficult, rocky workflow. Within this blog, I am hoping to address issues that people might encounter when animating and give advice as to how to address them. The main software used for the examples in the blog are Autodesk Maya and Unity, however the theories behind the workflow and habits are applicable to any 3D engine and game engine out there.

Asset Production:

It is important to understand the role of the asset within the scene, as this will determine many of the aspects of its geometry and design. You can get away with having extra polygons if your demo is going to be shown on a PC or laptop, but if you are targeting a mobile platform then every single vertex tends to be accounted for. Knowing how much budget is available and meant for assets is always useful. This way one to ensure you used all available vertices wisely and effectively.

The time spent making sure an asset is finished, optimised and ready to be incorporated into the game is time well spent, as it means little to no modification will be needed in the later stages of the production process. There is nothing worse than finding out the model’s poly count is too high and it needs to be reduced after having spent time weighting and animating it. In a case like this, you could reuse the animations; the model will need new weights, as the vertex number will be different after a reduction. And, even then, a reduction in vertices might result in the bones influencing the mesh differently, which could result in the animations being discarded too as the deforming mesh behaves strangely.

It is a good habit to spend a reasonable amount of time on a task and not rush it through. Rushing through one part of the process because the task seems to drag out and you’re itching to start something else is a very bad idea, as it tends to come back with a force later on. Below is an example of a good workflow for making optimised assets. The diagram defines each stage and allows clear progression from one part of the process to the next.

It is worth emphasising that whilst it is tempting to keep working on an asset to achieve the perfect balance between optimised mesh and high quality, there is a point where you should just declare it finished. You could consider an asset complete when it has a low poly count, the mesh is optimised for its purpose within the scene, has a good texture map and runs on the device whilst looking its best.


Figure 1 diagram.png

Fig. 1- example of workflow during asset production


Removing Transformations on a model:

Another point to emphasise is the cleanliness of the model. Cleanliness is a model that has no transformations applied to it and is at the origin of the scene. Any kind of transformation or residue (anything that will cause influence on the model, such as an animation keyframe) that remains on the model will have an effect on the animation, so it is essential for the asset to be clean and free from anything that could potentially influence the bones.

Before starting anything, freeze all transformations, delete the history of the scene, and make sure the model is where it should be and faces the correct direction. The reason behind this is to establish a neutral point to which you can always return during the animation process. The controllers used to move objects around a scene store the transformations in values of X, Y and Z. If one wants to return to the initial position at whatever point in the animation, it would make sense for that point to be 0, 0, and 0 instead of some arbitrary values that differ from controller to controller and would be difficult to track.


It is also worth pointing out that if one does not remember to freeze the transformations of a controller and bind it to a bone, the transformations of that controller will influence the bone and will most definitely make it move in ways, which are not desired.


Overall, zeroing out the transformations on the asset and on anything that is going to be applied to the asset is a good habit to keep, and one that most definitely pays off throughout the process.



Fig. 2- Mesh with transformations versus mesh without any transformations.
All the transformations applied to a mesh can be seen in the Channel Box menu.

Understanding Animation:

This is also a good point to introduce some terminology that might be used interchangeably throughout the text, in order to prevent any confusion:

  • When talking about the asset or model that is to be animated, one might refer to it as the ‘mesh’ or ‘skin’ as well as the terms used so far. 
  • ‘Rig’ and ‘skeleton’ are sister-terms, both refer to the hierarchy of bones and joints set up inside or around the object in order to animate it.
  • The bones are ‘bound’ to the skin, and will influence the mesh and deform it when moved. Skin weights or the action of ‘paint weighting’ allows control over that influence and enables the user to fix any incorrect deformations or influence that might occur.
  • Controllers are curves, or maybe other polygons, parented to the object or joint in order to make the animation process easier.

Moving the Mesh:

I hope these terms are clear and it is easier to understand some of the elements mentioned so far. Turning back to the clean mesh, at this point one should start considering how to proceed with the animation. Looking at the mesh construction tends to be a good starting point, as this might play a deciding factor. Not all meshes need a skeleton in order to be animated- skeletons and skinning can get expensive, so if the asset has the potential to be animated through a parented hierarchy it’s always better to do this. A character with detached limbs (think Rayman) or pieces of an engine that are moving in unison would be good examples of assets that would animate just fine with a parent hierarchy.


Here is an image of a very simple parent hierarchy set up in Maya:

Figure 3a.png

Figure 3a- Parent hierarchy example scene

Figure 3b.png

Fig. 3b- Parent hierarchy example set up


In the example shown in Figure 3a there are some simple shapes orbiting a cube. Each coloured controller controls the shapes individually, the black controller allows control over the small shapes, and the white controller moves both the small and big shapes. It is a simple set up, with it, one would be able to move the shapes, set the orbit, and with ease, even move the whole set up if they wanted.

The Rig:

On the other hand, humanoids, organic models or more complex assets do benefit from having a skeleton rig drive them. These rigs work in a similar enough way to how physical skeletons move a body. The bones are set up with IK handles, which create an effect close enough to muscles pulling on a joint to make it move. Rigs are easy to build and become familiar with, but can get complex very quickly, as shown in the example below:

fig 4.png

Fig. 4- Top-down view of the rig on a lizard


This rig contains about 95 bones and their respective controls, constrains (specific influences controllers cast on the joints) and weights. It works very smoothly, deforms nicely, allows good control over the lizard mesh, and performs well on a mobile platform. This rig was designed with complex movement in mind- it goes as far as having controls to allow the digits to contract and relax (Fig. 5)

fig 5.png

Fig. 5- Close up of finger control


Optimising a Rig:

This level of detail is fine if the camera is going to come quite close to the lizard and take a note of the finer movements, but might not be the ideal set up for something aimed at a mobile device or for a scene where the camera is not getting close enough to appreciate these movements. With this particular case, the asset happened to be the only animated one within the scene so there was enough budget to accommodate the amount of bones and influences, but what if that was not the case? Bones would need to be removed in order to accommodate for more animated characters. Using this example, the removal of the extra bones in hands and feet and a reduction of bones in the tail would remove around 65 bones, which is more than enough to animate another character and would reduce the bone count on the model by two thirds.

fig 6.png

Fig. 6- simple rig on a phoenix


Whilst the lizard is not an ideal candidate for a rig to drive an animation aimed for a mobile device, the rig on the phoenix is a much better example. In this case, the rig originally featured 15 bones, but an extra three were added to spread the influence caused in the lower part of the mesh, bringing the total count up to 18 bones. This particular model is also featured in a scene with other animated elements and plenty of effects, and was not meant to perform any particularly complex animation, so 18 bones is what it needs.


Always think carefully and take care when building the rig and controls that will drive your model. Make sure you understand what the animation is to achieve, and aim to build the rig in such a way that it can bring the object to life with the lowest quantity of bones as possible. As shown in fig. 7, a lot can be achieved with this particular rig.


Fig. 7- Progression of the animations of the phoenix, from complex flight to simpler, looping animation


The Animation Process:

So far, we have touched on the production of the assets, the rigging and skinning process and some optimisation practices, so it is time to address the actual animation process. Within computer animation, the process of animating tends to be carried out by positioning the model and keyframing the pose. The series of keyframes are then played one after another and bled together to form the resulting animation.


When animation movements are translated to the game engine, they can either be matched to triggers and played in response to them, or left to loop around on their own for as long as it’s needed. Simple looping animations are a very easy way to bring a scene to life without a complex animation, and if done right they can give the illusion of being one long string of constant movement.


ARM’s Ice Cave demo makes use of these types of animations to bring the cave to life. The butterfly and tiger both move around the cave in a constant loop, which were timed to fit with each other, and the phoenix constantly tries to come back to life but is always stopped by the loop of his animation taking him back to his sleeping, starting state.


Fig. 8- The Ice Cave Demo by ARM


Throughout the production of Ice Cave, we found that this was the best way to bring dynamism to the scene, as it would allow the video to loop continuously without restarting the demo when the animation stop.

I have repeated throughout this article that it is important to have a clear vision of what one is trying to achieve with their project, but that is because this knowledge makes many of the aspects of the production much smoother. More often than not, the result is good, optimised models, a well-constructed scene and cleverly tailored visual effects that, when put together, create the illusion that the overall product is of much higher specifications than what it actually is.


A breakdown of the scene, its elements, and their purpose will always help. Also, consider how the viewers will interact with the finished product: is everything going to reset after a certain point, or is it going to be playing continuously? Using these as a basic guideline, it will soon become clear what the best way to animate the objects is and how to best go on about it.

Animations in a Game Engine:

I hope that by this point it is evident that the asset creation and animation process is quite a complex process, full of elements to remember and consider at every point in the pipeline. The last step in the process is to export the animation and place it within the scene in your game engine of choice.


There are a few formats you can export your animated model to, but the most widely used are .fbx and .dae. Unity is able to handle maya’s .ma and .mb files as well, which can contain animations. The theory is simple enough, but in practice, a few things can go wrong resulting in the animation not exporting, or exporting in a wrong way.


3D model viewers are extremely useful when previewing the animations, as what might look fine in Maya might not be the same as what you get in Unity or other game engines. Assimp, Open3mod and Autodesk FBX converter are some examples of 3D viewers- FBX converter being particularly useful as it allows converting files from one format to another (fig. 9). This became very useful in situations in which animations would only export correctly in one file format but not the one that was needed. Even so, after running the model through some 3D viewers it’s always worth checking one last time within Unity or the game engine of choice. Unity allows the user to view the animations within the inspector tab (fig. 10) which will give an idea of how the animated model will look in the scene. It is worth noting that sometimes the mesh will deform awkwardly, and before panicking and thinking the animation exported wrongly, it is worth checking how many bones are influencing each vertex, as this might be the root of the problem.

fig 9.png

Fig. 9- Screenshot for the Autodesk FBX converter

fig 10.png

Fig. 10- Unity inspector tab displaying the animation of the phoenix


Taking an asset from start to finish is a very long process full of situations where many things can go wrong very easily, but understanding how each part of the process works and how best to go on about it will make it easier to handle. Throughout the course of this article, I have talked about 3D assets and how to progress from the initial sculpt to a fully animated asset integrated within the scene, with a focus on the animation part of the process. I hope that this has provided insight into the more artistic side of the 3D art world and solved any doubts, as well as provided useful advice for avoiding problems and keeping up good habits throughout the process.

Starship was formed to use our extensive experience developing software for games & simulations and apply it to market segments that hadn’t yet been exposed to the transformative power of digital technology.



One of the markets we quickly identified was cooking: people are obsessed with celebrity chefs, cooking shows and recipe books, but they haven’t really taken advantage of the latest software features when transferring across to the app space - most recipe apps are, at best, a glorified PDF, and cooking games are rendered in a cartoon style. We were sure we could do a lot better than that!


Our primary technical worry, though, was the steep “uncanny valley” drop off. Just like when looking at human faces, the brain has evolved to be able to spot fake/bad food a mile off. If we wanted realism, it wouldn’t be computationally cheap. On the plus side, our initial UX experiments immediately found the fun: on tablet devices where we can be tactile and the size format closes matches the pan & plates we wanted to represent.


CyberCook's objective then was to achieve a realistic 3D simulation of how food looks and behaves, all while running on tablet (and mobile) hardware at 30fps.





In general, food has pretty similar requirements to human skin to look realistic, which meant we could use the plentiful skin shading research as a reference. As we found, translucency, subsurface scattering, energy conserving BRDFs, IBL reflections, linear lighting and depth of field are all required to render believable food, while being quite a tall order for the mobile GPUs at the time of development.


A physically based solution would have been the ideal choice, but we couldn't afford it on mobile. we opted instead for a physically inspired solution, carefully testing which features made the most difference to the end results and letting go of the energy conserving requirement outside of the main BRDF.


The base intuition we took from our preliminary research on the task was that Depth of Field and Linear lighting are essential to the perception of skin and organic materials as realistic. The typical gamma falloff is ingrained in our mind, after a couple of decades of 3D games, and it screams "I'm fake".




Starship graphics programmer Claudia Doppioslash (doppioslash) had the tricky job of picking the right techniques that would enable the artists to create the assets they needed:


"Linear lighting is not available for mobile in Unity, so we had to implement it from scratch. While it's a simple matter of making sure all your textures and colours inputs are wrapped in a pow(<colour>, 2.2) and having a full screen effect that sets it back to gamma at the end, it's also fiddly, takes up computing power and it was confusing for the artists. At that time full screen effects were not supported on Unity's scene view, so they had to edit the scene in gamma, while taking care to preview their work in a game camera with the full screen effect on.


Depth of Field, while being an expensive full screen effect we paid for in performance, really helped the perception of the image as having been taken from a real camera belonging to a tv cooking show or a professional food photographer. Our artists researched extensively the look of food photography to apply it to CyberCook."




"The choice of BRDFs was at the heart of achieving realism. We know the Blinn-Phong look all too well and have associated it with video games. The moment you see it your brain is subconsciously reminded that you are looking at a game. It's especially bad for organic matter and it wasn't much good as a simulation of the food being coated in oil, either.


We relegated it to be used for non-organic, non-metallic materials in the kitchen environment. The main specular BRDF used for food, metal, and wood is an energy conserving Cook-Torrance with a GGX distribution. It can give the soft quality necessary to believe that something is organic and also the smooth one necessary for the metal objects, and is, all in all, a very flexible BRDF."




"We also used the anisotropic Ashikhmin-Shirley BRDF for the oil coating specular on the food and for the most important metal objects, such as the pan and the hob. The food oil coating was a ripe ground for experiments, as it's not a problem many games have. Ashikhmin-Shirley is expensive but the results are miles away from the alternatives.


Having different BRDFs made it hard to achieve IBL coherent with the lighting of the scene. We used the technique in the Black Ops 2014 Siggraph presentation [1], but it was meant to be used with Blinn-Phong as a distribution. Nevertheless it worked well enough for our requirements.


Last but not least, we used a number of diffuse BRDFs: Phong, of course, then Oren-Nayar was used for some vegetables which didn't look good enough with Phong. Our implementation of Subsurface Scattering follows Penner's pre-integrated skin shading 2011 Siggraph talk [2].


We were forced by the complexity of how prawns look in real life to implement a very approximated translucency effect inspired by the DICE 2011 GDC talk [3]."



[1] Getting More Physical in Call of Duty: Black Ops II

[2] Eric Penner's Pre-Integrated Skin Shading

[3] Approximating Translucency for a Fast, Cheap and Convincing Subsurface Scattering Look

Following the strategic partnership announcement at GDC 2015 which was held in San Francisco from 2nd to 6th March : “ARM and Tencent Games collaborate to advance mobile gaming”, we have worked with Tencent Games to provide Tencent’s mobile game developers with:

  • Access to the latest developers boards based on high-performance ARM Cortex@ CPUs and ARM MaliTM GPUs
  • Guidelines for developing mobile games on ARM-based solutions that address compatibility and performance challenges
  • Access to engineering resources from Tencent R&D Game studios and the ARM ecosystem

In order to better manage all the collaborations with Tencent Games, in the last five months we have worked with Tencent Games to build a joint innovation lab which locates Tencent Games Headquarter and ARM Shanghai office respectively. Now this lab has been opening for around six thousands game developers of Tencent Games.

205300694411383192-resize.jpg“ARM + Tencent Games Innovation Lab” at Tencent’s Office



“ARM + Tencent Games Innovation Lab” at ARM Shanghai Office


At the lab, we provide lots of lots of development devices based on high-performance ARM Cortex@ CPUs and ARM MaliTM GPUs, such as:

  • Xiaomi TV II 55’ powered by MStar DTV Chipset (Quad-core A17 CPU and Mali-T760MP4 GPU)
  • Xiaomi 4K TV Box powered by Amlogic chipset (Quad-core A9 CPU and Mali-450MP4 GPU)
  • Nagrace HPH Android TV Gaming Box powered by Rockchip RK3288 chipset (Quad-core A17 CPU and ARM Mali-T760MP4 GPU)
  • HTC Desire 626w Android phone powered by MediaTek MT6752 chipset (Octa-core A53 CPU and Mali-T760 GPU)
  • Samsung Galaxy S6 powered by Samsung Exynos 7 Octa – 7420 (Octa-core big.LITTLE  A57 and A53 CPU and Mali-T760MP8 GPU)
  • Samsung Gear VR and Galaxy Note 4 powered by Samsung Exynos 7 Octa – 7410 (Octa-core big.LITTLE A57 and A53 CPU and Mali-T760MP6 GPU)
  • 64bit Android Smart TV powered by HiSilicon DTV Chipset (Quad-core A53 CPU and Mali-450MP6 GPU)
  • And other devices powered by ARM CPU and Mali GPU



In addition to provide these devices, we also pre-installed the great demos created by Mali ecosystem engineering team and ecosystem partners, including :

  • Ice Cave – Built on Unity 5 and using Global illumination powered by Enlighten engine with lots of advanced features, such as soft shadows and so on.


  • Moon Temple – Built on Unreal Engine 4 with ARM 64bit enabled and Mali specific features, such as ASTC(Adaptive Scalable Texture Compression) and PLS (Pixel Load Storage)

UE demo-resize.jpg

  • Cyber Cook – A big fun Mobile VR game from Starship for Samsung Gear VR


  • 格斗江湖 TV Game from Tencent Games


  • And other demos to showcase how to leverage ARM technologies to optimize the games


By studying these demos, the game developers will be easier to understand what benefits to their games by leverage ARM technologies, and have a try in their games; finally we expect the game developers of Tencent Games can develop the great games which have better compatibility, higher performance, greater visual effects and better power efficiency.


Under this joint lab, we also work with Tencent Games to organize the regular workshops to provide the face to face communication between Tencent Game developers and ARM ecosystem engineers. For example, we co-worked with Unity and Tencent Games to organize the VR workshops at Tencent Shanghai and Shenzhen offices recently, which were very successful!  See the pictures as below, you will know what I mean


510091093713461396-resize.jpgARM VR DAY at Tencent Office

We are pleased to announce the release of Mali Graphics Debugger 3.0, which focuses on the user experience and makes the most out of all the work that has been done in the last two years. This release has required a great engineering effort, which started a year ago, during which time we have also added OpenGL ES 3.1 and Android Extension Pack support, ARMv8 64-bit targets, live shader editing, and support for all the new released versions of the Android system.

Version 3.0 adds support for multi-context applications and the capability of tracing multiple processes on an Android and Linux system. We have also changed our underlying GUI framework and added a few new UI features to most views.


Read the release announcement on malideveloper.arm.com


Mali Graphics Debugger 3.0 small.png

This coming Sunday I am excited to be chairing "Moving Mobile Graphics" at SIGGRAPH in sunny downtown Los Angeles. The half-day course will provide a technical introduction to mobile graphics, with the twist that the talk content will span the hardware-software spectrum and discuss the state of the art with practitioners at the forefront. I hope the range of perspectives will give attendee a good feel for how the whole mobile stack hangs together, and also illustrate the trends and future directions that are being explored.


SIGGRAPH page: Moving Mobile Graphics | SIGGRAPH 2015

Course home page: http://community.arm.com/moving-mobile-graphics-2015


In order to cover the spectrum, the speaker list is a cross-section of the industry: Andy Gruber, Graphics Architect of the Qualcomm Adreno GPU will be discussing some of the things mobile GPUs do in order to operate in such low power conditions, including discussing the Adreno geometry flow. Andrew Garrard, Samsung R&D UK will be discussing how mobile GPU architectures affect the software and software APIs we build on. Marius Bjorge will be presenting some recent API and algorithmic innovations that can dramatically reduce bandwidth on mobile, including on-chip rendering techniques such as Pixel Local Storage and ways to construct highly efficient blur chains for post-processing effects such as bloom.


To represent the state of the art in games development we have three speakers from different areas of the industry. Simon Benge from Exient, developers of Angry Birds Go! and Angry Birds Transformers, discussing how they squeezed the limits of mobile while keeping a broad free-to-play install base. Niklas Nummelin of EA Frostbite will discuss how the AAA graphics engine Frostbite is being brought to mobile, and finally Renaldas Zioma from Unity will discuss how the physically based shading present in Unity 5 was optimised for mobile.


More information on the course can be found on the event site above. As of Sunday this will include course slides and notes, so if you are unable to attend in person be sure to check back after Sunday!




ARM organised an HTML5 technology workshop last week hosted by Liberty Global/Virgin Media at the Digital TV Group’s offices in central London,next to the MI6 building! This was a very well attended event with more than 45 delegates plus speakers from ARM, LibertyGlobal, Linaro, Opera Software, PlayCanvas and YouView.  The agenda covered HTML engine trends, UI implementation case studies, WebGL, developer support, Encrypted Media Extensions, GPU and browser integration, Chromium Embedded Framework and the Chrome process.


My thanks go out to Richard Stamvik, Gemma Paris and Ryan Booth for helping with the logistics for the event, and to our co-hosts from Virgin Media for sponsoring the event.


Trends in HTML5 Engines [slides]

Trends in HTML.png


I opened the conference by talking about the current trends in HTML5 engines, with a focus on activities that are going to make HTML5 application development more relevant for embedded application developers.  The presentation introduces modern HTML5 techniques such as Web Components and Service Workers plus infrastructure changes such as HTTP/2.  All links are contained in the slides, so check them out for more information.

Implementing a Cutting Edge HTML5 UI [slides]

cutting edge ui.png


Nico Verhout, VP applications development and Anne Bakker, director web development at LibertyGlobal (LGI), biggest worldwide cable provider with 27M customers in Europe & Latin America, spoke about their cutting edge HTML5 UI. The Horizon UI targets all screens from in house hardware to 3rd party STB's and second screen devices.  It uses HTML5, WebGL, CSS3 animations and pixi.js animation physics. Besides openness, multiple vendor, large ecosystem and portability, they talk about how HTML5 offers stability, performance, productivity and security.

Creating Performant HTML5 Apps for TV's [slides]

performant apps for TVs.png


Thomas Kurowski, Opera, talked about performant HTML5 applications for TV and the importance of simple design and effective implementation to provide good performance. In some instances, this can be achieved by avoiding animations and using Blink-based Opera’s optimisations.

Developer Support and Challenges [slides]

developer support.png


Ian Renyard and Andrea Fassina presented their view on the HTML landscape from the viewpoint of supporting third party developers for the YouView platform.  They discusses d the transition from Adobe Air for TV in 2012 to HTML5 in 2014 with the challenges faced in that transition.  Currently only the BBC apps, iPlayer and red button are in HTML5, but plans are in motion to move more apps over.

The main theme of this presentation was to understand your production environment, make sure you test and validate on target and understanding the footprint and characteristics of the libraries that you use in your apps.  They presented some great recommendations to developers and are willing to share these findings with the community.

Real World WebGL - The making of SeeMore WebGL [slides]

seemore webgl.png


Will Eastcott of PlayCanvas presented their Open Source WebGL engine and how they implemented the SeeMore demo providing solid jank free performance on a midrange ARM-based tablet. The very impressive demo showcases physically based rendering supporting materials textures, light mapping and shadows through WebGL, and various optimisations, e g halos implemented using transparent camera-aligned sprites.

The Challenge of Integrating Media Playback on Embedded Devices [slides]

media playback.png


Alex Ashley, YouView, talked about challenges with HbbTV 2.0 supporting multiple playing/paused videos; combining live video and advertisements; HTML5  trick play; testing live stream seek; and lack of data sharing, e g cookies, between browser and media player.


Check out his slides for some sound recommendations for media playback.

Chromium Embedded Framework Integration [slides]



Zoltan Kuscsuk, Linaro, talked about HTML5 in embedded Linux; how to embed a browser in a native application; hardware accelerated video playback; and media encryption extensions.

He presented GPU-accelerated rendering in Chromium running in an ARM and Linux-based STB environment.

Encrypted Media Extensions on OP TEE [slides]

op tee.png


A second session by Zoltan covered Trusted Execution Environment (TEE)-protected DRM, and Encrypted Media Extensions (EME)  in the open source OPTEE which uses HTML’s MediaElement (there was some but not much TrustZone and TEE experience in the audience). The draft EME specification is also supported by Chromium.

Integrating the Mali GPU with Browsers [slides]

Mali GPU.png

Wasim Abbas, technical lead for middleware graphics in ARM’s Mali ecosystem team, presented ARM’s GPU and browser integration work, focusing on embedded devices’ low power budget.  This presentation covered low level aspects of embedded GPU’s, and some great tips for developers targeting integrating against an embedded system.

The Chrome Process [slides]

chrome process.png


I close the conference out with a look at how Chrome development actually happens, it looks into the intent process and explores how all development is discussed with open forums in the blink-dev community.  This presentation is a call to arms for those who have a vested interest in the web platform to get involved in the platforms development.

The presentation then goes on to look at some of the cool new platform enhancements that are in flight, and a look into future work that will have a positive impact on our lives as embedded HTML5 application engineers.


This conference pulled in attendees from across the value chain for the embedded world including chipset vendors, STB OEM's middleware providers and operators.  Feedback from the event was very positive with great networking opportunities.  Bring like minded people into one room to work on common problems across the segment.

We hope to replicate the format again, so watch this space and we hope to see you all at a future event soon.

At a recent press event in China, Huawei announced their latest flagship smartphone, the Honor 7. This device is special. It marks a major milestone in the use of innovative heterogeneous computing technologies.



A special device…


Initial sales figures indicate the device is doing pretty well: a record 200,000 units sold in two minutes in the first flash sale (that is 1,667 handsets a second!), followed by 9 million units pre-ordered in the first week. There are also plans to make this available in Europe, I look forward to get my hands on one.


Powering the  Honor 7 is the in-house Kirin 935 SoC, featuring an octa-core 64-bit system with big.LITTLE technology (two quad-core ARM Cortex-A53 processor clusters) and a Mali-T628 MP4 GPU.


Here is the official launch video:


… with a special camera


Among the many leading features, the device includes a 20-megapixel camera with f/2.0 aperture and phase detection auto-focus, which allows the camera to focus in just 0.1 seconds, and an 8-megapixel front camera with fixed focus and f/2.4 aperture. What is really special about this phone, is every time you take a photo, the Mali GPU processes it to improve its appearance.

The OpenCL standard API has been used by Huawei to offload key image processing steps onto the GPU. Inside the camera stack the processing is optimally balanced between the CPU and the GPU. Using the processor that is most suited to each step ensures increased efficiency.

This is a break away from the common approach of using dedicated h/w IP, and has enabled a market leading OEM such as Huawei to work on advancing the algorithm with new techniques and optimizations all the way to the device launch. And of course further improvements can still be rolled out with over the air updates whilst the device are in the field, whereas hardware updates are not possible.


huawei launch photo.png


ARM and Huawei engineers have collaborated very closely in this project. Key algorithms such as de-noising have been ported to the Mali architecture using OpenCL and optimized at micro and macro level to operate more efficiently. Now every photo the user takes with this device is processed through the Mali GPU, in real time.



Is it that simple?


De-noise may sound diminutive, however it can indeed be a very complex nut to crack, in particular in challenging lighting conditions, which is where most noise occurs, and where most photos are taken.

Any implementation of a de-noise pipeline normally includes a mix of the common steps, for example: Haar features detection, Gaussian blurring, Sobel operators, biliteral filtering, down and up-scaling applied to various channels, all interleaved in more or less complex pipelines, depending on the implementation. These type of filter chains can easily exceeded 20 stages, and need to operate on high resolution images and critically, in real time and to a sensible power overhead.


On paper each individual block would seem suitable for GPU acceleration. A lot of work went into getting this pipeline optimized, as well as fine tuning interoperation between CPU and GPU, and integrating this new functionality in the existing device camera framework.



But, wait, why not use the ISP?


Although using an ISP, or any dedicated hardware for the matter, can often provide advantages in power, performance and area for the specific use case it is designed for, it also often has and important shortcoming: limited flexibility. Fixed logic cannot be changed once is it committed to silicon. The algorithm in question was modified, improved, developed aggressively right to the wire. The choice of using OpenCL on the Mali GPU has essentially enabled the OEM, Huawei in this instance, to give the end user a better camera experience, using existing hardware. In addition to superior image quality, in their public launch Huawei claimed a performance improvement of 2x through the use of the GPU, so faster pictures too!



It’s a journey, and it’s getting exciting.


The Honor 7 device marks a major adoption milestone for ARM’s vision of heterogeneous computing and GPU compute. But it is just the latest milestone of a fantastic journey in the adoption of this technology. Here is a refresher of what took us here.


GPU Compute in mobile and embedded systems was introduced in shipping devices in late 2012, when the Google Nexus 10 tablet was launched with RenderScript support on the Mali-T604 GPU.


Since then a large number of partners have endorsed this technology to improve a variety of end-user applications.


In early 2013 many partners (I have personally being involved in excess of 25 independent projects) have gradually enabled their technologies to use OpenCL on Mali. We have since seen:

  • RenderScript accelerated image filters by MulticoreWare and advanced camera and looks improvement by ThunderSoft
  • An entire interleaved HDR ISP pipelines using OpenCL by the sensor vendor Aptina (now On Semi)
  • Ittiam Systems showcasing at various events incremental improvements to their HEVC and then VP9 decoders using OpenCL. This included demonstration of how GPU Compute can save power in a real silicon implementation.
  • Google adding OpenCL support to VP9 and open-sourcing this in the WebM project for all to use
  • Gesture middleware and face tracking applications using OpenCL from our partner eyeSight Technologies of Israel.


This snowball was certainly rolling! And much continued to happen early this year:

At CES 2015 Omnivision announced the availability of an advanced imaging library aimed at complementing their camera module ISP using the Mali GPU. The library includes advanced imaging features such as 3D noise suppression, chroma noise reduction, de-fringe and de-haze and is targeted at smartphones and tablets.

At the Mobile World Congress in Barcelona earlier this year ArcSoft demonstrated their latest middleware camera products running on MediaTek MT6752 based chip. This included pre-ISP image stabilization, real-time dynamic video HDR and other camera middleware with the objective of bringing to the mass market features that are typically available at the high end, this made possible by the use of GPU Compute.


ARM has recently taken part in the Embedded Vision Alliance Summit in Santa Clara, where we hosted a workshop on computer vision on ARM based system. Fotonation, Morpho, ArcSoft, Wikitude and many others have discussed how they have been using NEON and Mali OpenCL to improve their latest products. Our own Gian Marco Iodice and Tim Hartley have also detailed how Mali can help with problems such as stereo processing and efficiently use heterogeneous computing. You can access the proceeding of this event here.


OEMs have an important challenge in their hands: how to deliver the best user experiences whilst respecting the lowest possible energy budgets. Huawei’s approach using GPU Compute in their camera, illustrates the merits of this technology. This is the latest exciting step in the adoption of this technology.


A bright future for GPU Compute


Beyond smartphone cameras, GPU compute enables a new field of innovation and new customer experiences driven by real-time visual computing. Target applications include computational photography, computer vision, deep learning, and the enablement of new emerging multimedia codecs and algorithms. So far we have only scratched the surface of the potential that GPU Compute can deliver. ARM is always collaborating with new partners in exciting use-cases areas and I personally look forward to continue this journey and see GPU Compute proliferate and delivering more and more benefits to users.

Chinese version中文版:百视通/ARM HTML5技术论坛顺利举行(附资料下载)

Dear All, good day. HTML5 is really one of the hottest word now in Internet technology area. I believe most of you heard it before. These years HTML5 commercial promotion have a number of ups and downs as a result that it is said that every year is the Year One of HTML5. But with the HTML5 specification fixed, 2015 would be the hottest year of HTML5. According to statistics, there are over 50 HTML5 related meetings in 2015 in China. News keep coming about that HTML5 companies launched IPO and HTML5 startups  won billion level valuations.



ARM as the Global lead IP technology company, we will never be absent from the HTML5 boom. Yesterday, BesTV/ARM HTML5 technical forum was held in BesTV New Media Institute (Shanghai) besTV This forum is run by besTV and ARM, inviting upstream and downstream guests from SoC ,ODM/OEM, ISP, solution vendors, channel and developers to discuss the HTML5 technology status and future.


Dr. Wen Li , senior VP of besTV and vice dean of BesTV New Media Institute gave opening speech and technical speech HTML5 will be main technical choice of smart TV and home entertainment area. He analyzed the advantage of HTML5 and introduced their excellent work in China government-sponsered core technology project and China SARFT leading TVOS project.




Matt Spencer the UI and browser marketing manager from ARM UK HQ, gave the speech HTML5 newest trends and the latest progress of global HTML5 technologies and specification information, as well as the positive role of ARM and our strategy.



The Top 2 HTML5 game engines in China came on the stage one after the other, Jianyun Bao , senior engineer of developer service department of Cocos2d-x Community , the giant mobile game company in China gave the speech Cocos HTML5 solution and introduced the total solution support to HTML5 games in open source engine, tool chain, runtime, channel access, test/analysis and payment system. And next to him, the technical evangelist Xinlei Zhang , who is from the fast rising HTML5 game engine – Egret, presented the audience their total solution support to HTML5 game in open source engine , tool chain, running platform , channel access, test/analysis and payment system too. He also introduced the new launched mobile application framework Lark.




Oversea HTML5 game engine also came to our forum. Joel Liang, senior ecosystem engineer from ARM gave the speech PlayCanvas: 3D game engine base on WebGL. He showed the advanced GPU algorithms based on WebGL in PlayCanvas engine. It can give users great visual experience of HTML5 game like a Hollywood movie. The most of the technology development are done under the cooperation between PlayCanvas and ARM.



Finally, Xiao Shen the development director from Thunder Software Technology Co., Ltd the leading solution vendor in China, gave the speech Thundersoft experience in HTML5 development . They have many maturing products, reference designs and technical storage of HTML5 solution for SoC, ODM/OEM and IPS to do quick prototype/production development.



In this forum  the attendees raised many questions andthe discussion was very hot. We got fully success in this forum. Since many other people were not able to appear in this event but keen to the contents, please let us share the slides in the enclosed files. Welcome to share your comments.

Thanks a lot.

Machine Vision is perhaps one of the few remaining areas in technology that can still lead you to say “I didn’t know computers could do that”.  The recent pace of development has been relentless.  On the one hand you have the industry giants competing to out-do each other on the ImageNet challenge, surpassing human vision recognition capabilities along the way, and on the other there is significant, relentless progress in bringing this technology to smart, mobile devices.  May 11 and 12 saw the annual Santa Clara California gathering of industry experts and leaders to discuss latest developments at the Embedded Vision Alliance Summit.  Additionally, this year ARM was proud to host a special seminar linked to the main event to discuss developments in computer vision on ARM processor technologies.  In this blog I’m going to provide my perspective of some of the highlights from both events.



The Santa Clara Convention Centre, California.  Host to both the ARM workshop and the EVA Summit


Computer Vision on ARM Seminar, 11 May, Santa Clara, CA


workshops.jpgIt was my great pleasure to host this event and for those of you who were there I hope you enjoyed the afternoon’s presentations and panel discussion.  The proceedings from the seven partner presentations can all be downloaded from here.  The idea of this event – the first of its kind ARM has held on computer vision – was intended to bring together leaders and experts in computer vision from across the ARM ecosystem.  The brief was to explore the subjects of processor selection, optimisation, balancing workloads across processors, debugging and more, all in the context of developing computer vision applications.  This covered both CPU and NEON™ optimisations, as well as working with Mali™ GPUs.


With a certain degree of cross-over, the seminar program was divided into three broad themes:

Optimising Computer Vision for NEON

  • Dr. Masaki Satoh, a research engineer from Morpho, talked about the benefits and main technical aspects of NEON acceleration, including a focus on specific algorithmic optimisations using NEON SIMD (Single Instruction, Multiple Data) instructions.
  • Wikitude are a leader in Augmented Reality applications on mobile devices and in his talk CTO Martin Lechner highlighted the use of NEON to accelerate this particular computer vision use case.

Real Computer Vision Use Cases on ARM

  • Ken Lee, founder and CEO of Van Gogh Imaging, showcased their work developing real-time 3D object recognition applications using 3D stereoscopic sensors, including optimisation via NEON and their early exploration of further acceleration via Mali GPUs.
  • Gian Marco Iodice, Compute Engineer at ARM, discussed his work on accelerating a real-time dense passive stereo vision algorithm using OpenCL™ on ARM Mali GPUs.
  • Real-time image stabilization running entirely in software was the subject of the presentation by Dr. Piotr Stec, Project Manager at FotoNation.  His analysis covered the complete processing pipeline for this challenging use case and discussed where optimisations were most effective.

Processor selection, Benchmarking and Optimising

  • Jeff Bier, president of BDTI and founder of the Embedded Vision Alliance discussed the important area of processor selection and making intelligent choices when selecting benchmarking metrics for computer vision applications.
  • Tim Hartley (that’s me!) discussed the importance of whole-system measurements when developing computer vision applications and demonstrated profiling techniques that can be applied across heterogeneous CPU and GPU processor combinations.



Jeff Bier from BDTI gets things going with his presentation about processor selection and benchmark criteria

Panel Discussion

In addition to the above presentations Roberto Mijat hosted a panel discussion looking at current and future trends in computer vision on mobile and embedded platforms.  The panel included the following industry experts:


Laszlo Kishonti CEO of Kishonti and new venture AdasWorks, a company creating software for the heavily computer vision dependent Advanced Driver Assistance Systems market.  Working with ARM, AdasWorks has explored accelerating some of their ADAS-related computer vision algorithms using a combination of ARM CPUs and a Mali GPU.  Tim Hartley from ARM talks in this video recorded at a previous event about some of the previous optimisation work around AdasWorks with the use of the DS5 Streamline profiler.
Michael Tusch, CEO of Apical, a developer of computer vision and image processing IP and covering the future algorithm development for imaging along with display control systems and video analytics.  Apical are a long-time collaborator on computational photography and have much experience using the GPU as well as the CPU for image processing acceleration.  In the previously recorded video here Michael talks about Apical's work and their experience using GPU Compute to enable hardware-based graphics acceleration.
Tim Droz, GM of SoftKinetic, a developer of 3D sensor and camera modules as well as 3D middleware, and covering issues around 3D recognition, time of flight systems, camera reference designs for gesture sensing and shared software stacks.  This video recorded at GDC 2013 shows an example of some of SoftKinetic’s work with GPGPU on Mali for their gesture-based systems.


It was a fascinating and wide-ranging discussion with some great audience questions. Roberto asked the panelists what had stood out for them with computer vision developments to date.  Laszlo talked about the increasing importance of intelligence embedded in small chips within cameras themselves.  Michael Tusch echoed this, highlighting the problem of high quality video in IP cameras causing saturation over networks.  Having analysis embedded within the cameras and then only uploading selective portions, or even metadata describing the scene, would mitigate this significantly.  Tim Droz stressed the importance of the industry moving away from the pixel count race and concentrating instead on sensor quality.


Roberto then asked about the panelist’s view on the most compelling future trends in the industry.  Michael Tusch discussed the importance in the smart home and businesses of the future of being able to distinguish and identify multiple people within a scene, in different poses and sizes, and being able to determine trajectories of objects.  This will need flexible vision processing abstractions with the the aim of understanding the target you are trying to identify: you cannot assume one size or algorithm will fit all cases.  Michael forsees, just as GPUs do for graphics, the advent of engines cable of enabling this flexible level of abstraction for computer vision applications.


Laszlo Kishonti talked about future health care automation including sensor integration in hospitals and the home, how artificial intelligence in computer vision for security is going to become more important and how vision is going to enable the future of autonomous driving.  Laszlo also described the need for what he sees as the 3rd generation of computer vision algorithms.  These will require levels of sophistication that will reach, for example, the ability of differentiating between a small child walking safely hand-in-hand with an adult with one where there might be a risk of running out into the road.  This kind of complex mix of recognition and semantic scene analysis was, said Laszlo, vital before fully autonomous vehicles can be realized.  It brought home to me both the importance of ongoing research in this area and perhaps how much further computer vision technology has to develop as a technology.


Tim Droz talked about the development of new vector processors flexible enough for a variety of inputs, HDR – high dynamic range, combining multiple images from different exposures – becoming ubiquitous, along with low-level OpenCL implementations in RTL.  He also talked about Plenoptic, light-field cameras that allow re-focusing after an image is taken, becoming much smaller and more efficient in the future.


The panel ended with a lively set of questions from the audience, wrapping up a fascinating discussion.



Gian Marco Iodice talks about accelerating a real-time dense passive stereo vision algorithm

Overall it was a real pleasure to see so many attendees so engaged with the afternoon and we are grateful to all of you who joined us on the day.  Thanks also to all our partners and panellists whose efforts led to a fascinating set of presentations and discussions.

The presentations from the seminar can be downloaded here: EVA Summit 2015 and ARM’s Computer Vision Seminar - Mali Developer Center


Embedded Vision Summit, 12 May, Santa Clara, CA

eva.pngThe annual Embedded Vision Summit is the industry event hosted by the Embedded Vision Alliance, a collection of around 50 companies working in the computer vision field.  Compared to the 2014 event, this year saw the Summit grow by over 70%, a real reflection of the growing momentum and importance in embedded vision across all industries.  Over 700 attendees had access to 26 presentations on a wide range of computer vision subjects arranged into 6 conference tracks.  The exhibition area show-cased the latest work from 34 companies.


See below for links to more information about the proceedings and for downloading the presentations.


Dr. Ren Wu, Distinguished Scientist from Baidu delivered the first of two keynotes, exploring what is probably the hottest topic of the hour: visual intelligence through deep learning.  Dr. Wu has pioneered work in this area, from training supercomputers through to deployment on mobile and Internet of Things devices.  And for robot vacuum cleaner fans – and that’s all of you surely – the afternoon keynote was from Dr. Mike Aldred from Dyson who talked about the development of their 360° vision (and ARM!) enabled device which had earlier entertained everyone as it trundled around the exhibition area, clearing crumbs thrown at it by grown men and women during lunch.



ARM showcased two new partner demos at the Summit exhibition: SLAMBench acceleration on Mali GPU by the PAMELA consortium and video image stabilization in software with Mali acceleration by FotoNation

The six conference tracks covered a wide range of subject areas.  Following on from Ren Wu’s keynote, Deep Learning and CNNs (Convolutional Neural Networks) made a notable mark with its own track this year.  And there were tracks covering vision libraries, vision algorithm development, 3D vision, business and markets, and processor selection.  In this final track, Roberto Mijat followed on from ARM’s previous day’s seminar with an examination of the role of GPUs in accelerating vision applications.



Roberto Mijat discusses the role of the integrated GPU in mobile computer vision applications

A list of all the speakers at this year's Summit can be found here: 2015 Embedded Vision Summit Speakers

All the papers from the event can be downloaded here (registration required): 2015 Embedded Vision Summit Replay

Hey everybody,


ARM have arrived and the talks have started! We're eagerly awaiting the Game Jam kicking off at 8pm CEST. See all the details on the Mali Developer Center



ARM at Shayla Games

We have with us a load of Samsung GearVR kits for developers to work on and optimise for mobile as well as our help and advice available all weekend! The GearVR kits work with the Samsung Galaxy Note 4 which has the latest Mali-T760 GPU inside. Full spec below:


Samsung Galaxy Note 4

Samsung Exynos 7 Octa

ARM Mali-T760 GPU (MP6)

ARM big.LITTLE™ processing technology

ARM Cortex®-A53 CPU (MP4)

  ARM Cortex-A57 CPU (MP4)


We're excited to see all the new innovations in the VR space and what the developers can put together in just a weekend.


You can see the full agenda for the event at the Shayla Games website



Have you ever heard about the Taoyuan effect in reflections? Probably not, so I invite you to read this blog to learn about this effect and the application of local cubemaps in graphics.


Several blogs published in this community cover different uses of local cubemaps. The first was about reflections  followed by two more blogs that describe novel uses of local cubemaps to render shadows  and refractions  with great quality and performance.


These new techniques, completely developed by the ecosystem demo team in ARM’s Media Processing Group, are especially relevant to developing for mobile devices where runtime resources must be carefully balanced. They not offer only great performance but also high-quality rendering, which makes these techniques appealing to desktop computers and consoles’ developers as well. Fetching a texture from a static cubemap guarantees very precise and stable shadows, reflections and refractions, compared to the pixel instabilities we would get when rendering these effects at runtime with a moving camera using conventional techniques.


Thanks to these different uses of the local cubemap techniques, we can achieve very high quality graphics with existing hardware.


Unity Unite events in Asia


Recently Sylwester Bala and I attended Unity Unite events in Seoul, Beijing and Taipei where we presented the talk “Enhancing Your Unity Mobile Games”. I delivered the talk at Seoul, Nathan Li at Beijing and Sylwester at Taipei. In Seoul and Beijing there were joint talks with Carl Callewaert from Unity (see here).


In the talk we introduced the concept of local cubemap and how it can be used to render reflections, and we expanded this technique for rendering shadows in an innovative way. During the talk we used several short videos to illustrate the concept and the advantages of these techniques, which helped to deliver a clear and understandable message to the audience. A couple of videos were particularly useful to show how our new shadows technique can render dynamic and soft shadows.


In the final part of the talk Carl gave a live demo about some of the most important improvements in Unity 5: Global Illumination, Physically-Based Shading and Reflection Probes. Unity’s implementation of Reflection Probes is based on the local cubemap rendering technique. Now Unity developers have access to a simpler but powerful technique to add reflections in their games in a further optimized way, which is particularly relevant for mobile platforms. Carl also mentioned that Unity might consider implementing the new shadows technique in the engine as it is tightly close to the reflection probe feature.

In all cities, the talks were received well with great expectation and had a high number of attendees, reflecting the interest in these techniques. Carl showed a live Unity demo with a great example on how reflections based on local cubemaps work. His example clearly demonstrated the advantages of this technique and how easy it can be used in the Unity engine.



Top left: Roberto Lopez Mendez at Seoul. Top right: Nathan Li at Beijing.

Bottom Left: Carl Callewaert at Beijing. Bottom right: Sylwester Bala at Taipei.



Reflections and shadows in games


During a long night walk in Seoul, Sylwester Bala and I talked about how reflections and shadows can contribute to improving the visual experience of games. We walked along Yeongdong Avenue and there were high buildings full of neon and glass. Reflections and shadows were practically in every surface. Modern cities are full of polished, high-reflective glass and metal surfaces, virtually everywhere.


Therefore, when rendering this kind of environment in games, we need to consider the fact that the local cubemap technique might offer a very efficient way of rendering reflections and shadows in combination with other rendering techniques.


Figure 1. Reflections on the facade of the Coex building at Seoul.



It is well known that reflections and shadows are key topics in any game. Without reflections and shadows any virtual world would look plain and unrealistic. Let’s take for example reflections. Reflections change every time the camera updates its position and orientation. We could prebake in a texture the reflections from the static environment but the final effect will be disappointing. An effective way of rendering high quality and optimized reflections is using local cubemaps. Using this technique we prebake the environment into static cubemaps and later at runtime we render the reflections by retrieving the texture from the cubemaps, using the local corrected view vector and then combining the contribution of local cubemaps for a certain position.



And what do you do with dynamic geometry? We obviously can’t prebake the reflections from dynamic geometry. In this case, we can render at runtime reflections/shadows from dynamic geometry using traditional techniques and combine them with reflections/shadows rendered using the local cubemap technique.



Unfortunately, the use of local cubemaps in reflections is not yet widely implemented, despite the fact that it has been available for 15 years. Now with the implementation of the Reflection Probe in Unity 5 the technique of reflections based on local cubemap is becoming available to more than the half of developers all over the world, which is pretty good. To provide support for our shadows technique based on local cubemaps in the Unity engine, it would be as simple as rendering the transparency of the environment into the alpha channel of the same cubemap used for reflections.



The Taoyuan effect



At the very end of our journey when heading to the flight gate at the Taiwan Taoyuan International Airport in Taipei, Sylwester and I were walking through a long corridor with a very polished and reflective floor. On one side the wall was projecting some message with Chinese symbols, which were perfectly reflected on the floor. The picture drew my attention and I pointed out to Sylwester that was a clear use case of reflections based on local cubemaps.



Nevertheless, Sylwester, who always pays extra attention to details, pointed out that further away from the wall reflections were more blurred. We looked at each other because it was clear that this effect can be implemented in reflections in the same way we implemented soft shadows.


Figure 2. Reflections on the corridor at the Taiwan Taoyuan International Airport.

In the shadows technique, we implemented the blurred effect by fetching the texture from an interpolated cubemap mipmap level. The magnitude passed to the fetching function is proportional to the distance from the fragment to the intersection point of the fragment-to-light vector with the scene-bounding box (a more detailed explanation of the shadows technique is in this blog).


As it happens with reflections based on local cubemap, our technique offers very clear advantages in terms of quality and resource saving when rendering shadows. Regardless of its limitations -mainly derived from its static nature- I would like to encourage developers to try the technique and explore it beyond the use case we presented in our talk. You never know what a technique can do for you until you start to use it and push it to new extremes. If you are skeptic about that then please continue reading.


The effect was caused by the fact that the Chinese symbols were carved in to the wall and the light source was behind in such a way that the light scattered in the borders of the symbols provoking a soft pattern of light beams.



I decided to implement this effect in the same demo we used to show reflections and shadows: the chess room demo. The results are below. Fig. 3 shows the reflections on the chessboard using the standard technique of local cubemap. Fig. 4 shows a blurred reflection based on the distance from the fragment to the intersection point of the reflection vector with the scene-bounding box. The further the reflection is from the real object, the more blurred it is rendered.



  Figure 3. Standard reflections on the chess board based on local cubemap.



Figure 4. Reflections on the chessboard based on local cubemap using an interpolated mipmap level based on

the distance from the fragment to the intersection point of the fragment-to-light vector with the scene-bounding box.




This example shows how powerful and flexible the technique of local cubemaps is. It allows rendering of not only soft shadows but also the reflections coming from soft light beams.





What other new applications of local cubemaps will come? As local cubemaps are becoming more popular, developers will find new applications for this technique - it is just a matter of time. But one thing is certain; this simple technique has proved to be a very effective and powerful way to render reflections, shadows and refractions.

My last blog looked at some of the critical areas which an application has to implement efficiently to get best performance out of 3D content, such as broad-brush culling of large sections of a scene which are guaranteed not to be visible so they are not sent to the GPU at all. In one of the follow on comments to this blog seanlumly01 asked "Is there a performance penalty for an application modifying textures between draw calls?". It is a really good question, but the answer is non-trivial, so I deferred to this blog post to answer it fully.


Pipelined Rendering


The most important thing to remember when it comes to resource management is the fact that OpenGL ES implementations are nearly all heavily pipelined.  This is discussed in more detail in this earlier blog, but in summary ...


When you call glDraw...() to draw something the draw does not happen instantly, instead the command which tells the GPU how to perform that draw is added to a queue of operations to be performed at some point in future. Similarly, eglSwapBuffers() does not actually swap the front and back buffer of the screen, but really just tells the graphics stack that the application has finished composing a frame of rendering and queues that frame for rendering. In both cases the logical specification of the behaviour  - the API calls - and the actual processing of the work on the GPU are decoupled by a buffering process which can be tens of milliseconds in length.


Resource Dependencies


For the most part, OpenGL ES defines a synchronous programming model. Apart from a few explicit exceptions, when you make a draw call rendering must appear to have happened at the point that the draw call was made, with pixels on screen correctly reflecting the state of any command flags, textures, or buffers at that point in time (based either on API function calls or previously specified GPU commands). This appearance of synchronous rendering is an elaborate illusion maintained by the driver stack underneath the API, which works well but does place some constraints on the application behavior if you want to achieve the best performance and lowest CPU overheads.


Due to the pipelining process outlined earlier, enforcing this illusion of synchronicity means that a pending draw call which reads a texture or buffer effectively places a modification lock on that resource until that draw operation has actually completed rendering on the GPU.



For example, if we had a code sequence:


glBindTexture(1)       // Bind texture 1, version 1
glDrawElements(...)    // Draw reading texture 1, version 1
glTexSubImage2D(...)   // Modify texture 1, so it becomes version 2
glDrawElements(...)    // Draw reading the texture 1, version 2


... then we cannot allow the glTexSubImage2D() to modify the texture memory until the first draw call has actually been processed by the GPU, otherwise the rendering of the first draw call will not correctly reflect the state of the GL at the point the API call was made (we need it to render the draw using the contents of the physical memory which reflect texture version 1, not version 2). A lot of what OpenGL ES drivers spend their time doing is tracking resource dependencies such as this one to make sure that the synchronous programming "illusion" is maintained, ensuring that operations do not happen too early (before the resources are available) or too late (after a later resource modification has been made).


Breaking Resource Dependencies


In scenarios where a resource dependency conflict occurs - for example a buffer write is requested when that buffer still has a pending read lock - the Mali drivers cannot apply the resource modification immediately without some special handling; here are multiple possible routes open to the drivers to resolve the conflict automatically.


Pipeline Finish


We could drain the rendering pipeline to the point where all pending reads and writes from the GPU for the conflicted resource are resolved. After the finish has completed we can process the modification of the resource as normal. If this happens part way through the drawing of a framebuffer you will incur incremental rendering costs where we are forced to flush the intermediate render state to main memory; see this blog for more details.


Draining the pipeline completely means that the GPU will then go idle waiting for the CPU to build the next workload, which is a poor use of hardware cycles, so this tends to be a poor solution in practice.


Resource Ghosting


We can maintain both the illusion of the synchronous programming model and process the application update immediately, if we are willing to spend a bit more memory. Rather than modifying the physical contents of the current resource memory, we can simply create a new version of the logical texture resource, assembling the new version from both the application update and any of the data from the original buffer (if the modification is only a partial buffer or texture replacement). The latest version of the resource is used for any operations at the API level, older versions are only needed until their pending rendering operations are resolved, at which point their memory can be freed. This approach is known as resource ghosting, or copy-on-write.


This is the most common approach taken by drivers as it leaves the pipeline intact and ensures that the GPU hardware stays busy. The downsides of this approach are additional memory footprint while the ghost resources are alive, and some additional processing load to allocate and assemble the new resource versions in memory.


It should also be noted that resource ghosting isn't always possible; in particular when resources are imported from external sources using a memory sharing API such as UMP, Gralloc, dma_buf, etc. In these cases other drivers, such as cameras, video decoders, and image processors may be writing into these buffers and the Mali drivers have no way to know whether this is happening or not. In these cases we generally cannot apply copy-on-write mechanisms, so the driver tends to block and wait for pending dependencies to resolve. For most applications you don't have to worry about this, but if you are working with buffers sourced from other media accelerators this is one to watch out for.


Application Overrides


Given that resource dependencies are a problem on all hardware rendering systems due to pipeline depth, it should come as no surprise that more recent versions of OpenGL ES come with some features which allow application developers to override the purely synchronous rendering illusion to get more fine control if it is needed.


The function glMapBufferRange() function in OpenGL ES 3.0  allows application developers to map a buffer into the application's CPU address space. Mapping buffers allows the application to specify an access flag of GL_MAP_UNSYNCHRONIZED_BIT, which loosely translates as the "don't worry about resource dependencies, I know what I am doing" bit. When a buffer mapping is unsynchronized the driver does not attempt to enforce the synchronous rendering illusion, and the application can modify areas of the buffer which are still referenced by pending rendering operations and therefore cause incorrect rendering for those operations if the buffer updates are made erroneously.


Working With Resource Dependencies


In addition to the direct use of features such as GL_MAP_UNSYCHRONIZED_BIT, many applications work with the knowledge that the resource usage is pipelined to create flexible rendering without causing excessive ghosting overheads.


Separate Out Volatile Resources


Ghosting can be made less expensive by ensuring that volatile resources are separated out from the static resources, making the memory regions which need to be allocated and copied as small as possible. For example, ensuring that animated glyphs which are updated using glTexSubImage2D() are not sharing a texture atlas with static images which are never changed, or ensuring that models which are animated in software on the CPU (either via attribute or index update) are not in the same buffer as static models.


Batch Updates


The overheads related to buffer updates can be reduced, and the number of ghosted copies minimized, by performing most of the resource updates in a single block (either one large update or multiple  sequential sub-buffer/texture updates), ideally before any rendering to a FBO has occurred. Avoid interleaving resource updates with draw calls like this ...




... unless you are able to use GL_MAP_UNSYNCHORNIZED_BIT. It is usually much more efficient to make the same set of updates like this:




Application Pipelined Resources


If the application wants to make performance more predictable and avoid the overheads of ghosting reallocating memory in the driver, one technique it can apply is to explicitly create multiple copies of each volatile resource in the application, one for each frame of latency present in the rendering pipeline (typically 3 for a system such as Android). The resources are used in a round-robin sequence, so when the next modification of a resource occurs the pending rendering using that resource should have completed. This means that the application modifications can be committed directly to physical memory without needing special handling in the driver.


There is no easy way to determine the pipeline length of an application, but it can be empirically tested on a device by inserting a fence object by calling glFenceSync() after a draw call using a texture, and then polling that fence object by calling glClientWaitSync() with a timeout of zero just before making the modifications N frames later. If this wait returns GL_TIMEOUT_EXPIRED then the rendering is still pending and you need to add an additional resource version to the resource pool you are using.


Thanks to Sean for the good question, and I hope this answers it!




Pete Harris is the lead performance engineer for the Mali OpenGL ES driver team at ARM. He enjoys spending his time working on a whiteboard and determining how to get the best out of combined hardware and software compute sub-systems. He spends his working days thinking about how to make the ARM Mali GPUs even better.

Filter Blog

By date:
By tag: