1 2 3 Previous Next

ARM Mali Graphics

281 posts

For a long time, I've not really been interested in 3D programming.

(Well, I've done some minor OpenGL programming many years ago, but I must admit that I'm more of a dev-tools programmer)

 

After watching the video Alban linked to in ARM Processor - Sowing the Seeds of Success - Computerphile, I decided to watch some more from Computerphile.

 

The videos by John Chapman are spoken in a very clear and very easy to understand english, and they explain advanced technology in a way so it's easy to follow.

If you're new to 3D programming, this might be a good starting point.

 

A Universe of Triangles

 

 

True Power of the Matrix

 

 

Triangles to Pixels

 

 

Visibility Problem

 

 

Lights and Shadows in Graphics

 

The Embedded Vision Summit 2015 is nearly upon us.  This annual gathering of experts interested in this highly dynamic area is a day of fascinating presentations and demonstrates of leading-edge developments in vision-enabled products.  The Summit is on 12 May.

 

This year ARM® is hosting a special half-day seminar connected with the Summit on the day before.  Titled “Enabling Computer Vision on ARM” the event will see a number of industry-leading developers presenting their experiences in computer vision working over a variety of ARM platforms and use cases.

 

Growth in Computer Vision


Computer vision is seeing phenomenal growth in adoption and deployment.  The increased power, efficiency and variety of processors is enabling many new use cases while revolutionizing existing vision applications previously confined to the desktop.  These are now increasingly possible on energy efficient mobile devices and across market segments from automotive, retail, medical and industry.

 

In this seminar, a selection of computer vision experts and leaders in their fields will present their experiences working with ARM-based systems across a variety of real use cases.  Attendees will learn how to:

 

  • Resolve common issues encountered when implementing complex vision algorithms on embedded processors in areas such as object recognition and augmented reality
  • Balance workloads across processors and processor types
  • Debug heterogeneous vision applications on ARM-based systems to remove design bottlenecks and improve performance and efficiency

 

Seminar program:

 

  • Jeff Bier, President, BDTI
    Title: Benchmarking Metrics and Processor Selection for Computer Vision
    Abstract:
    This presentation looks at the long term trends in computer vision applications and processors and the challenges these pose to benchmarking vision applications.  As the complexity of applications, processors and the heterogeneous design of systems increases, so do the challenges in measuring their performance in meaningful ways.  Being able to assess the relative performance of processors and processor types under various combinations and configurations is a vital factor in matching systems to particular use cases.  For example, mobile use cases are becoming the a key focus for software development and these systems increasingly rely on heterogeneous configurations to increase processing efficiency.  To use these processors efficiently, developers must determine the optical mapping of their applications onto the SoC’s heterogeneous processing cores.


  • Dr. Masaki Satoh, Morpho Inc
    Title: Development of Image and Vision Processing Software and Optimizations for ARM
    Abstract:
    This presentation will give technical insight on the benefit of NEON Acceleration, including detailing actual performance improvement.  From the developer perspective, the presentation will examine specific algorithms and how they are optimized with NEON™.  The future of imaging will also be examined, looking at the potential of GPU compute and other heterogeneous combinations, deep learning image recognition engines accelerated through NEON and OpenCL, and research and development into automotive products.

  • Dr. Piotr Stec, Project Manager in the Imaging Field, FotoNation
    Title: Video Image Stabilization for Mobile Devices
    Abstract:
    The presentation will show the processing steps needed to perform video stabilization on mobile devices. We will show the algorithm flow indicating the steps that need to be taken to perform the algorithm and the data flow between various components of the algorithm. Some parts of the algorithm proven to be particularly challenging in terms of achieving suitable performance. Not always the most complex parts turned out to be the bottleneck. We will show how those difficulties were overcome on the device using ARM chipset and what gains were achieved in terms of processing time.  The last part of the presentation will be a live demonstration of the working algorithm.

  • Gian Marco Iodice, Compute Engineer, ARM
    Title: Real-time Dense Passive Stereo Vision: A Case Study in Optimizing Computer Vision Applications Using OpenCL on ARM
    Abstract: Passive stereo vision is a powerful visual sensing technique aimed at inferring depth without using any structured light. Nowadays, as it offers low cost and reliability solutions, it finds application in many real use cases, such as natural user interfaces, industrial automation, autonomous vehicles, and many more. Since stereo vision algorithms are extremely computationally expensive, resulting in very high CPU load, the aim of this presentation is to demonstrate the feasibility of this task on a low power mobile ARM Mali GPU. In particular, the presentation will focus on a local stereo vision method based on a novel extension of census transform, which exploits the highlyparallel execution feature of mobile Graphic Processing Units with OpenCL.  The presentation will show also the approaches and the strategies used to optimize the OpenCL code in order to reach significant performance benefits on the GPU.

  • Martin Lechner, CTO, Wikitude
    Title: Utilizing NEON for Accelerated Computer Vision Processing in Augmented Reality Scenarios
    Abstract:
    In the core of the Wikitude SDK runs an engine that heavily relies on different computer vision algorithms to get information about the current environment of the user. As those algorithms can be very computationally intensive, a major part in our work is to optimize and specifically design the algorithms for the architecture on mobile devices.  As most of the current mobile phones have either armv7 or armv8 architectures the ARM NEON SIMD-instruction set provides a huge possibility for improving the performance of computer vision algorithms. This presentation will focus on how the NEON instruction set can be used to improve the performance of general image processing functions. It will also include a discussion of our experience with the NEON instruction set, specifically the process on how to find the hotspots in the code and how those functions can be tested and debugged as well as our experience with porting the NEON functionality from armv7 to armv8.

  • Tim Hartley, Technical Marketing Manager, ARM
    Title: Measuring the Whole System: Holistic Profiling of CPU and GPU for Optimal Vision Applications on ARM Platforms
    Abstract:
    Developers of sophisticated vision applications need all the processing power they can lay their hands on, and using OpenCL on a GPU can be a vital additional compute resource.  But spreading the workload amongst processors and processor types brings its own problems and difficulties, and traditional application optimization techniques are not always effective in this brave new heterogeneous world.  The key to achieving performance is twofold: getting access to hardware counters for all the processors in your system, and then understanding what those numbers are telling you.  In this talk, I will examine the tools and techniques available to profile these sorts of applications and will use real case studies from vision applications. Using tools like DS5 Streamline I will show how to extract meaningful performance numbers and how to interpret them.

  • Ken Lee, Founder and CEO, Van Gogh Imaging
    Title: Using ARM Processors to Implement Real-Time 3D Object Recognition on Mobile Devices
    Abstract: Diverse applications such as 3D printing, augmented reality, medical, parts inspections, and ecommerce can benefit significantly from the ability of 3D computer vision to separate a scene into discrete objects and then recognize and analyze them reliably. The 3D approach is much more robust and accurate than the traditional 2D approach and is now possible with embedded 3D sensors and powerful processors in mobile devices.  This discussion will focus on how real-time 3D computer vision can now be implemented in the ARM CPU.  Further, we will discuss how these algorithms can be further accelerated using ARM Mali GPU with OPENCL implementation.

 

 

The seminar will include an industry panel discussion. Experts from the worlds of vision IP, ADAS (Automatic Driver Assistance Systems), and image sensors and recognition software will discuss future trends in technology for ARM-based systems.

 

To register for free for the ARM Seminar:
http://www.eventbrite.com/e/seminar-enabling-computer-vision-on-arm-tickets-10231417445?aff=EVS

 

There’s more about the Embedded Vision Summit here:
http://www.embedded-vision.com/summit

At GDC 2015, ARM and PlayCanvas unveiled the Seemore WebGL demo. If you haven’t seen it yet, it takes WebGL graphics to a whole new level.

 

seemore
CLICK HERE
TO LAUNCH SEEMORE

 

So why did we build this demo? We had two key goals:


Put amazing demo content in the hands of you, the developer

Seemore WebGL is the first conference demo that has been developed to run specifically in the web browser. This is great, because you can run it for yourself and do so on any device. Nothing to download and install - hit a link, and you’re immediately dropped into a stunning 3D experience. And better yet, you can learn from the demo and use that knowledge in your own creations.

Demonstrate console quality graphics on mobile

ARM Mali GPUs pack a serious graphical punch and Seemore is designed to fully demonstrate this power. We have taken advanced graphical features seen in the latest generation of console titles and optimized them to be completely mobile friendly. And best of all, all of this technology is open sourced on GitHub.

It's not practical to examine all of the engine updates we made to bring Seemore to life. So instead, let’s examine three of the more interesting engine features that were developed for the project.

 

Prefiltered Cubemaps

This is the generation and usage of prefiltered cubemaps. Each mipmap level stores environment reflection at different level of surface roughness - from mirror-like to diffuse.

 

prefilter

 

How did we do it?
First, we added a cubemap filtering utility to the engine (GPU-based importance sampling). The next step was to expose this functionality in the PlayCanvas Editor. This technique uses Phong lobes of different sizes to pre-blur each mip level. Runtime shaders use either the EXT_shader_texture_lod extension (where supported) or reference mip levels stored as individual textures that are interpolated manually.


Show me the code!

https://github.com/playcanvas/engine/pull/202


Further reading:

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html
https://seblagarde.wordpress.com/2012/06/10/amd-cubemapgen-for-physically-based-rendering/

 

Box-projected cubemaps

This feature makes cubemaps work as if projected onto the insides of a box, instead of being infinitely far away (as with a regular skybox cubemap). This technique is widely used in games for interior reflection and refraction.

 

bpcem34


How did we do it?

This effect is implemented using a world-space AABB projection. Refraction uses the same code as reflection but with a different ray direction, so the projection automatically applies to it as well.


Show me the code!

https://github.com/playcanvas/engine/pull/183


Further reading:

http://www.gamedev.net/topic/568829-box-projected-cubemap-environment-mapping/

 

Custom shader chunks

Standard material shaders in PlayCanvas are assembled from multiple code 'chunks'. Often, you don't want to replace the whole shader, but you'd like to only change some parts of it, like adding some procedural ambient occlusion or changing the way a surface reflects light.

 

This feature was required in Seemore to achieve the following:

 

  • Dual baked ambient occlusion. The main plant uses 2 AO maps for open and closed mouth states which are interpolated dynamically.

    AO

  • Fake foliage translucency. This attenuates emission to make it appear as though light is scattered on the back-faces of leaves in a hemispherically lit room. The plant’s head uses a more complex version of the effect, calculating per-vertex procedural light occlusion.

    fol

  • Plant/tentacle animation. Procedural code that drives vertex positions/normals/tangents.


How did we do it?

Shader chunks are stored in the engine sourcebase as .vert and .frag files that contain snippets of GLSL. You can find all of these files here. Here’s an example chunk that applies exponential squared fog to a fragment:

uniform vec3 fog_color;
uniform float fog_density;

vec3 addFog(inout psInternalData data, vec3 color)
{
    float depth = gl_FragCoord.z / gl_FragCoord.w;
    float fogFactor = exp(-depth * depth * fog_density * fog_density);
    fogFactor = clamp(fogFactor, 0.0, 1.0);
    return mix(fog_color, color, fogFactor);
}

Each chunk file’s name becomes its name at runtime, with PS or VS appended, depending on whether the chunk forms part of a vertex or pixel shader. In the case above, the filename is fogExp2.frag. It’s a simple matter to replace this fragment on a material. Simply do:

  material.chunks.fogExp2PS = myCustomShaderString;


Show me the code!

https://github.com/playcanvas/engine/pull/172

 

So there you have it. A brief insight into some of the latest technology in the PlayCanvas Engine. Want to find out more? Head over to GitHub, watch, star and fork the codebase - get involved today!

collabora-mali.png

 

Since our successful demonstration at SIGGRAPH, ARM and Collabora have continued to work together on providing the best possible platform for media playback and presentation. Numerous applications such as digital signage, IVI, tablets, remote monitoring, and more, all require accurate, high-quality and low-power video presentation, with as low a thermal envelope as possible.

 

Collabora has made significant contributions to, and maintenance of, both the standard open-source GStreamer media framework, and the next-generation Wayland window system. Combining the two has allowed us to bring out the full extent of the capabilities of GStreamer, for years used in broadcast television with its exacting standards, and Wayland's lightweight and flexible design, which above all else emphasises accuracy and perfect end results.

 

The result is a system providing perfectly synchronised network video presentation. Three displays, all powered by separate ODROID-XU3 systems using the Samsung EXYNOS 5422 SoC with an ARM MALI-T620 GPU. Each system displays one segment of the video, with one acting as the co-ordinator to keep timing consistent across all three segments. From the user's point of view, the video appears as one consistent whole.

 

Network synchronisation with GStreamer


GStreamer is the reference open-source media framework, used in everything from audio playback on embedded systems, to huge farms powering broadcast TV. GStreamer's pipeline concept provides a flexible and lightweight transport to suit almost any usecase. In this particular instance, we are using GStreamer to load H.264 content from disk, pass it to a hardware H.264 decoder, feed the resulting frames to Wayland, and then feed the timing information from Wayland back to the master device.


Core to GStreamer's flexibility and applicability has been its excellent support for clock control, being able to synchronise multiple disparate sources. Its clock control supports both hardware and software sources and sinks and allows the most precise matching possible between input audio and video clocks, and the output device's actual capabilties.


GStreamer's measurement was then supplemented by an open-source distributed media control system called Aurena, which uses these measurement reports and targets from GStreamer to synchronise playback across all three devices.


The work we did with GStreamer to enhance its Wayland and H.264 hardware decoding support is both already merged to the upstream open-source project, and fully hardware-independent.

 

 

Accurate display with Wayland


The next-generation Wayland window system allows us to make the most efficient possible use of the hardware IP blocks, not only maximising throughput (thus increasing the highest achievable resolution, or number of streams, without sacrificing quality), but also providing predictable presentation.


Wayland's design goal of 'every frame is perfect' means that the content shown to the user must always be complete, coherent, and well-timed. The frame-based model employed is a significant stride over legacy X11 and DirectFB systems, and the consideration given to timing concerns allows us to make sure that the media is always delivered as close to on time as possible, without unsightly visual artifacts such as tearing.


Building on this solid and well-tested core, Collabora developed multiple extensions to Wayland. The first ensures that no copies of the video data are made in the compositing process, preserving precious memory bandwidth, latency, and overall system responsiveness. This extension uses the latest Khronos Group EGL extensions, as supported by ARM's MALI GPU.


However, even without this copy stage, as video usage continues to push at the margins of hardware performance – one recent customer project involved 4K output of nine 1080p H.264 streams on an embedded system – we realise that it might not be physically possible to obtain full frame-rate at all times. To compensate for this, Collabora developed an additional Wayland extension, allowing not only real-time feedback of actual hardware presentation time, but ahead-of-time frame queueing.


This feedback mechanism allows GStreamer to dynamically adjust its clock to obtain perfect synchronisation both across devices and between audio/video, whilst the ahead-of-time queueing gives the hardware the best possible chance to make frame deadlines, as well as preserving power by allowing the hardware to enter sleep states for longer.


This work is all either included with current Wayland releases, or actively being discussed and developed as part of the upstream open-source community.

 

 

Hardware enablement

 

Neither GStreamer nor Wayland required any hardware-specific development or tweaking. However, in order to make this work possible, Collabora has worked extensively on the kernel drivers for the Exynos 5422 SoC found inside the ODROID-XU3. Bringing the Exynos hardware support up to speed with the latest developments in the Kernel Modesetting and Video4Linux 2 subsystems, as well as fixing bugs found in our automated stress-testing laboratory, allowed this work to proceed without a hitch.

 

Far from being throwaway, this work is being merged into the upstream Linux kernel and U-Boot projects, as part of our ongoing commitment to working closely with the open source community to raise the bar for quality and functionality. As this and other platforms rapidly adopt these improvements and become able to run this work, device manufacturers are able to select from the greatest possible choice of vendors. Our work with our partners, including ARM, the wider open source community, and membership of the Khronos Group, continues to deliver benefits for the entire ecosystem, not just one platform or device.

 

This open standards-based approach allows platform selection to be driven by the true capabilities of the hardware and cost/logistics concerns, rather than having to fret about software capability and vendor lock-in.

 

 

Further development

 

Of course, the power of GStreamer, Wayland, and a standards-based Linux system does not just stop there: its extensibility includes support for OpenGL ES and EGL, as well as arbitrary client-defined content. Not only can applications like web browsers take advantage of this synchronisation in order to embed seamless media content in HTML5 displays, but the source data can be anything from a single OpenGL ES application, a web browser, or anything else imaginable. Whether providing for immersive gaming experiences or large-scale digital signage, the underlying technology is flexible and capable enough to deal with any needs.

 

 

ARM is a registered trademark of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved.

Marius Bjørge

Pixel Local Storage

Posted by Marius Bjørge Apr 17, 2015

It's now been a year since we announced the Shader Pixel Local Storage extension. Here I'll recap what we've done since the time of the release.

 

What is Pixel Local Storage?

I recommend reading Jan-Harald Fredriksen's blog post Pixel Local Storage on ARM® Mali™ GPUs for background information about what Pixel Local Storage is and the advantages of exposing it.

 

Order Indepdendent Transparency

At SIGGRAPH 2014 we presented "Efficient Rendering with Tile Local Storage", with detailed use-cases mixing advanced techniques such as deferred shading and order independent transparency. The problem with transparency is that blending operations tend to be non-commutative, meaning that the end result is highly sensitive to the shading order of the blended fragments. Using Pixel Local Storage we implemented both a full depth-peeling approach and also compared that against approximate approaches such as Multi-Layer Alpha Blending and Adaptive Range blending. Not only that; we implemented all of this very efficiently on-top of a fully deferred shading renderer. Please see Efficient Rendering with Tile Local Storage for more details.

oit_pls.png

 

Collaboration with Epic

We integrated Pixel Local Storage into Epic's Unreal Engine 4. This enabled more efficient HDR rendering as well as features such as bloom and soft particles.

unrealengine_pls.png

 

Sample code

We've also release a couple of samples showing how to use Pixel Local Storage in your own code.

 

Shader Pixel Local Storage

ShaderPixelLocalStorage.png

This sample implements deferred shading using pixel local storage.

http://malideveloper.arm.com/develop-for-mali/sample-code/shader-pixel-local-storage-sample/

 

Translucency

translucency_00.png

This sample uses Pixel Local Storage to render translucent geometry.

http://malideveloper.arm.com/downloads/deved/tutorial/SDK/android/2.0/translucency.html

 

 

References

  1. Pixel Local Storage on ARM® Mali™ GPUs
  2. Supporting the development of mobile games at GDC 2015
  3. Efficient Rendering with Tile Local Storage

Most people use Mali Graphics Debugger (MGD) to help debug OpenGL® ES applications on Linux or Android. However graphics is not the only API that is supported by MGD, in-fact MGD supports applications that use the OpenCL™ API as well. This means that if you run MGD with an application that uses OpenCL you will get the same function level tracing that you would get with OpenGL ES. You will also get access to the kernel source code in much the same way you would get access to your OpenGL ES shader source code.

 

With the release 2.1 of Mali Graphics Debugger the OpenCL feature set has been improved by the inclusion of GPUVerify support. GPUVerify is a tool for formal analysis of kernels written in OpenCL and was partly funded by the EU FP7 CARP project http://carpproject.eu/. The tool can prove that kernels don’t suffer from the following three issues:

  • Intra-group data races: This is when there is a data race between work items in the same work group.
  • Inter-group data races: This is when there is a data race between work items in different work groups.
  • Barrier divergence: This is when a kernel breaks the rules for barrier synchronization in conditional code defined in the OpenCL documentation.

 

The tool was created by Imperial College London and more information about the tool and the issues it can diagnose can be found by visiting http://multicore.doc.ic.ac.uk/GPUVerify.

 

In essence if you provide GPUVerify with the source code of a OpenCL kernel along with the local and global work group sizes it can check your kernel for the issues highlighted above. One of the main reasons for including support for this tool in MGD is that the information required to run GPUVerify is already known from tracing your application. The steps needed to use this tool with MGD are outlined below:

 

Step 1: Download Mali Graphics Debugger v2.1 and GPUVerify

Mali Graphics Debugger can be downloaded from http://malideveloper.arm.com/develop-for-mali/tools/software-tools/mali-graphics-debugger/. If you are using an older version of MGD you will need to upgrade. Version 2.1 has much more included than just GPUVerify support, a few items are listed below:

  • Support for ARMv8 (Android 64-bit)
  • Support for tracing Android Extension Pack
  • Many improvements to the Frame Overrides feature.

 

GPUVerify can be downloaded in binary form for both Linux and Windows from the following location: http://multicore.doc.ic.ac.uk/tools/GPUVerify/download.php. GPUVerify does support Mac OS X as well but you will have to build it from source.

 

Step 2: Make sure GPUVerify is working stand-alone

Using GPUVerify successfully inside MGD is often easier once it is established that GPUVerify works in a stand-alone capacity. Once it has been downloaded the process of making it work is easy thanks to the documentation provided on the GPUVerify's website. Along with a section that provides common troubleshooting advice. A good way to check that it is working correctly is to try GPUVerify with some of the examples that are provided with the Mali OpenCL SDK.

 

The following image is of the output from GPUVerify running the HelloWorld example from the Mali OpenCL SDK:

consoleOutput.png

 

Step 3: Making MGD Aware of the Location of GPUVerify

As GPUVerify is not shipped with MGD you must specify the location of where you installed it to MGD. To do this in MGD do the following:

 

  • Click Edit -> Preferences

editPreferencesMenu.png

  • Click on the Text box next to "Path to GPUVerify" and provide the location of the GPUVerify binary.

mgdPreferences.png

 

Step 4: Run a normal OpenCL trace in MGD

As mentioned previously MGD will try to fill in the prerequisite information for GPUVerify from the trace. It does this by looking for several functions; the most important are clCreateProgramWithSource, clCreateKernel and clEnqueueMapBuffer.

trace2.png

If you don't use clCreateKernel to create your kernels, MGD can also obtain the information from clCreateKernelsInProgram. As shown in the image above MGD also captures the build options used in clBuildProgram to pass to GPUVerify. The more details that can be passed to GPUVerify the more accurate it can analyze your kernels.

 

Step 5: Running GPUVerify from MGD

To run the tool you need to select Debug -> Launch GPU Verify. MGD will then present a new dialog box which summarizes all of the information MGD managed to pull out of the trace. You are free to fill this information in or even change the information. One of the reasons you may want to do this is to try new work group sizes or global work sizes, to see if there are any unforeseen issues with different kernel parameters.

 

Stage 6: Analyzing the Results

The results should be placed in the console view as part of MGD. Here are the results of the HelloWorld example from above running through MGD:

successfulConsoleOutput.png

 

Summary

Mali Graphics Debugger can be used to do much more than debug and trace graphics applications. It can be used to debug and trace OpenCL applications as well. With the inclusion of GPUVerify support it is now possible to debug possible data race conditions in your kernels as well as barrier divergence issues. MGD will send as much data as it can to GPUVerify by analyzing the trace of your OpenCL application.

OpenGL ES is the standard API for 2D and 3D graphics on embedded systems, which includes mobile phones, tablets, smart TVs, consoles, and other appliances and vehicles. The API is a well-defined subset of desktop OpenGL.

 

In our ARM Mali DeveloperCenter, we have two OpenGL ES Software Development Kits, one for Android OS and one for Linux environments. These SDKs are aimed mainly from beginners to intermediate users, with a guide on how to get your system properly configured and set up, as well as how to build the OpenGL ES 2.0 and the latest OpenGL ES 3.x sample code and run them. 

 

Mali OpenGL ES SDK.png

 

 

 

The SDKs contain tutorials and sample code. The tutorials start from the basic concepts, such as introduction to shaders, and setting up the graphics pipeline to render to the display, followed by examples to render basic geometry and add textures and light onto them. The more advanced tutorials talk about our latest ARM Mali features like implementing deferred shading using the tile buffer available through the Shader Pixel Local Storage OpenGL ES extension, as well as samples illustrating the use of AdaptiveScalable Texture Compression (ASTC), and showing the use of the very latest OpenGL ES 3.1 APIs, released in the latest Android Lollipop.

 

The most important OpenGL ES 3.1 API feature is Compute Shaders, which allow the GPU to be used for general-purpose computing and therefore, the SDK includes a dedicated tutorial to it called introduction to compute shaders.  The tutorials refer to sample code, and a detailed summary of the Compute Shader sample code available in the SDK is given in Hans-Kristian’s blog.

 

ARM Mali-T6xx and ARM Mali-T7xx series support OpenGL ES 3.1, and the latest Android L OS is capable of running OpenGL ES 3.1 applications. At MWC, Samsung launched their Galaxy S6, based on Android L and supporting OpenGL ES 3.1. Other existing devices in the market with support for the latest API are the Nexus10 tablet (It might need upgrading the OS to Android L from here) and the Galaxy Note 4 smartphone, upgrading the OS to Android L, too.

 

 

About me

Hi, I am Hans-Kristian Arntzen! This is my first post here. I work in the Mali use cases team where we explore the latest mobile APIs to find efficient ways of implementing modern graphics techniques on the ARM Mali architecture.

Sometimes, we create small tech demos which result in Mali SDK samples, smaller code examples which you can take inspiration from when developing your own applications.

Since August, I've been writing quite a lot of code for OpenGL ES 3.1 and I will summarize what we have done with OpenGL ES 3.1 the last months.

 

About OpenGL ES 3.1

OpenGL ES 3.1 is an update to OpenGL ES 3.0 which recognizes the fact that OpenGL ES 3.0 capable hardware is already capable of much more, for example compute. OpenGL ES 3.1 now brings GPU compute support directly to OpenGL ES, so there is no longer any need to interface with external APIs to expose the compute capabilities of the hardware. The interface for compute is very clean, powerful and easy to use.

 

Compute support in graphics APIs means there are many more opportunities now for applications to offload parallel work to the GPU than before and being able to do this on mobile hardware is very exciting.

See Here comes OpenGL® ES 3.1! for more details.

 

Mali driver support for OpenGL ES 3.1

We released the r5p0 driver in December with support for OpenGL ES 3.1. The driver for Linux and Android platforms can be found here: Drivers - Mali Developer Center Mali Developer Center.

 

Update to the Mali OpenGL ES SDK

The latest Linux and Android OpenGL ES SDK has new sample code for compute shaders.

Mali OpenGL ES SDK for Linux - Mali Developer Center Mali Developer Center

Mali OpenGL ES SDK for Android - Mali Developer Center Mali Developer Center

The samples can be built for Linux development platforms with fbdev.

 

There is also OpenGL ES emulator support included (OpenGL ES Emulator - Mali Developer Center Mali Developer Center) so you can run the Linux fbdev samples on your desktop on Linux and Windows.

If your desktop implementation supports X11/EGL in Linux, you should be able to run the samples without emulator by leveraging the GL_ARB_ES3_1_compatibility extension which went  into core in OpenGL 4.5.

 

Introduction to Compute Shaders

Introduction to compute shaders - Mali Developer Center Mali Developer Center

Compute is a new subject for many graphics programmers. This document tries to explain the different mind set you need to effectively use GPU compute and the new APIs found in OpenGL ES 3.1.

It goes through the major features of compute, and in-depth into some more difficult subjects like synchronization, memory ordering and execution barriers.

It is recommended that you read this before studying the examples below unless you're already familiar with compute shaders.

 

Particle Flow Simulation with Compute Shaders

Particle Flow Simulation with Compute Shaders - Mali Developer Center Mali Developer Center

compute_particles01.jpg

This sample implements a modern particle system. It uses compute shaders to sort particles back-to-front which is critical to obtain correct alpha blending.

Since we can sort now on the GPU, we can offload the entire particle system to the GPU.

 

It also implements cool things like 4-layer opacity shadow map for some sweet volumetric shadow effects and simplex noise to add turbulence to the particles.

Combining all these techniques together allow you to create a very nice particle system.

 

Occlusion Culling with Hierarchical-Z

Occlusion Culling with Hierarchical-Z - Mali Developer Center Mali Developer Center

hiznoculling-12.jpg

Culling is important in complex scenes to keep vertex work down as mentioned in this blog post: Mali Performance 5: An Application's Performance Responsibilities

For game objects, there are many sophisticated CPU-based solutions which often rely on baking data structures based on how the scene is put together.

For example, in indoor scenes with separate rooms, it makes sense to only consider rendering the room you're in and objects from rooms with are visible from the room you're standing in. Doing this computation on-the-fly could get expensive, but once the information is baked, it is fairly simple.

 

However, when we add a large amount of "chaotic" elements to a more dynamic scene, it becomes more difficult to bake anything and we need to compute this on the fly. We have to look for some more general solutions for these scenarios.

The sample shows how you can use a low-resolution depth map and bounding spheres to efficiently cull entire instances in parallel before they are even rendered. It can also be combined with level-of-detail sorting to reduce geometry load even further.

Finally, the result is drawn with indirect draws, a new feature of OpenGL ES 3.1.

 

Using these kinds of techniques allow you to offload big "particle-like" systems to the GPU efficiently.

 

Game Developers Conference 2015

 

At GDC2015 we presented updated best practices for GLES 3.1 on Mali along with a newly developed tech demo. I manned our tech booth at the expo floor most of the time where I got to show my demo to other people, which was quite exciting.

 

Best Practices for OpenGL ES 3.1 on Mali

There are certain things you should think about when developing for Mali. During our work with OpenGL ES 3.1, we have found some general performance tips you should take into account.

Compute exposes more low level details about the architecture, and to get optimal performance for a particular architecture, you might need some specific optimizations.

If you are experienced with compute on desktop, you might find that many general truths about performance on desktop don't necessarily apply to mobile! Sometimes, performance tips are opposite of what you'd want to do on desktop.

If you have used OpenCL on Mali before, best practices for OpenCL also apply for compute shaders.

 

Presentations

I presented at the GDC2015 along with Tom Olson (Chair of Khronos OpenGL ES and Vulkan working groups, Director of Graphics Research at ARM) and Dan Galpin (Developer Advocate, Google).

The presentation goes through OpenGL ES 3.1 (focus on compute), some of the techniques I mentioned in this post, best practices for OpenGL ES 3.1 and AEP on Mali and a small sneak peak on early Vulkan experiments on Mali.

 

Unleash the Benefits of OpenGL ES 3.1 and Android Extension Pack:

http://malideveloper.arm.com/downloads/GDC15/Unleash%20the%20Benefits%20of%20OpenGL%20ES%203.1and%20the%20Android%20Exte…

 

At the ARM Lecture Theater at GDC2015, I also did a short presentation with focus exclusively on compute. It goes a bit more in detail on compute shader basics compared to the full length GDC talk:

http://malideveloper.arm.com/downloads/GDC15/Lecture%20Theatre%20-%20Wednesday/Unleash%20the%20Benefits%20of%20OpenGL%20…

 

Caveats with r5p0 release

Unfortunately, there are some performance bugs with some features in r5p0 release. You might stumble into them when developing for OpenGL ES 3.1.

  • Indirect draws can slow down a lot compared to regular draws.
  • Compute shaders with smaller work groups (e.g. 4 or 8 threads) are much slower (3-4x) than compute shaders using 64 or 128 threads.

 

If you run into these issues, they have been addressed and should be fixed in future driver releases.

 

 

Occlusion Culling with Compute Shaders demo

I am very excited about compute shaders and culling, so much that I wanted to create a demo for it at GDC. We do have the Occlusion Culling sample code in the SDK, but it is far too bare-bones to show at an event.

I attended GDC 2015, where I manned our tech booth most of the time and I got to show this demo to many people passing by.

 

Instead of dull green spheres I went for some procedurally generated asteroids. All the asteroids look slightly different even if they are instanced due to the use of a 4-component RGBA8 heightmap. All asteroids have different random weighting factors which make them look a bit different. They have independent radii, rotation axes and rotation speeds as well which makes the scene look fairly complex. Diffuse textures and normal textures are shared for all asteroids. They are also generated procedurally with perlin noise and compressed with ASTC LDR.

 

screen3.jpg

 

There are over 27000 asteroids in the scene here spread out across a big sphere around the camera.

At highest quality, each individual asteroid has over 2500 triangles. If we were to just naively draw this without any kind of optimization, we would get a triangle count in the ballpark of 50+ million which is extremely overkill.

 

We need some culling. The first and obvious optimization is frustum culling, which can remove most of the asteroids outright. We can do this on the GPU very efficiently and parallel since it's just a couple of dot products per instance after all.

All the asteroids in the scene are represented as a flat linear array of per-instance data such as position, base radius, rotation axis, rotation speed and heightmap weighting factors. We combine frustum culling with the physics update (rotating the asteroids and creating a final rotation quaternion per asteroid). Since we need to update every asteroids anyways, might as well do frustum culling while they are in cache!

 

screen2.jpg

 

Now we're looking at ~2000 asteroids being rendered, but just frustum culling is not enough! We also need LOD sorting to get the vertex count low enough.

The idea behind LOD sorting is that objects far away don't need high detail. We can add this technique to plain frustum culling and reduce the vertex count a lot. After these optimizations, we're looking at 500-600k triangles per frame, a 100x reduction from before. We can also use cheaper vertex shaders for objects far away, which reduces the vertex load even more. We can also do this efficiently in compute shaders, it's just a question of pushing per-instance data to one of many instance buffers if it passes the frustum test.

 

screen5.jpg

Here we see close objects in white and it gets darker as the LOD factor increases.

screen6.jpg

We can also use different shading for close objects. Here, close asteroids have full bling with normal mapping and specular highlights from the skydome, objects farther away are only diffuse with spherical harmonics for diffuse lighting.

This kept fragment shading load down quite a bit. Screenshot shows the debugged normals. The normals without normal mapping look a bit funky, but that's because the normals are computed directly in the vertex shader by sampling the heightmap multiple times. With shading applied it looks fine however

 

But we can do even better. You might have noticed the transparent "glasslike" wall in front of the asteroids? It is supposed to be opaque. We wanted this to be a space station interior or something cool, but unfortunately we didn't have time for GDC

 

screen7.jpg

 

The main point here is that there is a lot of stuff going on behind the occluder in the scene. There is no reason why we should waste precious bandwidth and cycles on vertex shading asteroids which are never seen.

Enter Occlusion Culling!

 

We can go from this:

screen1.jpg

 

to this:

screen0.jpg

 

After this optimization we cull over half the asteroids in the scene on average, and we are looking at a very manageable 200-300k triangles.

My hope for the future is that we'll be able to easily do all kinds of scene management directly on the GPU. It's not feasible to do everything on the GPU quite yet, the CPU is still very capable of doing things like these, but we can definitely accelerate massively instanced cases like these

 

 

Skydome

The skydome is procedurally generated with FBM noise. It is HDR and is used for all the lighting in the scene. I compressed it with ASTC HDR instead of RGB9E5, a 32-bit shared exponent format which is pretty much the only reasonable alternative if I didn't have ASTC HDR.

 

Performance

I squeezed out 60 fps at 1080p/4xMSAA on a Samsung Galaxy Note 10.1 (Mali-T628 MP6) and Samsung Galaxy Note 4 (Korean version, Mali-T760 MP6) when all culling was applied, which I'm quite happy with .

I used DS-5 (ARM DS-5 Streamline - Mali Developer Center Mali Developer Center) to find bottlenecks when tuning along with (Mali Offline Compiler - Mali Developer Center Mali Developer Center) to fine-tune the shaders (mediump varyings can make a lot of difference!).

Ellie Stone

Lighting in Games

Posted by Ellie Stone Apr 13, 2015

Promoting_Mali_Developer_Centre_940x380px.jpg

Light is sight

 

When we start talking about the importance of lighting in Geomerics, we often refer directly to light’s importance in setting the mood and atmosphere of a scene - but we jump way ahead of ourselves here. Step one of light in physics – without light, there is no sight. Everything we see in the real world is the result of light reflecting off surfaces and into our eyes.  If we turn off the lights in a room, close the curtains, stuff the doorframe with fabric to stop light leaking in, the objects within it are still there, our eyes are still there, but the objects remain unseen. Defining and highlighting form is the first step in lighting; it lets us see details in objects and determine how they are shaped. Smooth surfaces gradient softly between light and shadow, but sharp edges deliver distinct changes.


Going from this basic first step to effectively using lighting to set the mood, intensity and atmosphere of a scene is a long jump. There is no magic formula for getting the perfect combination of light, shadow and color to achieve the desired artistic vision for the environment, in part because it is, like any art, subjective.The combination depends so much on the ambience being created - mystery horrors, for example, will tend to use low lighting and lots of shadows, punctuated by pools of light to grab your attention and draw you in; sometimes even plunging the player into darkness with the only light source being a torch controlled by the player.


The variety of lighting

 

When designing a game, there are many different light sources available: for example directional, ambient, spotlight and point light. The source of a directional light is infinitely far away such that by the time they reach the viewer all light rays are parallel – a good example is sunlight; they can be stationary or movable. Ambient lighting casts soft rays equally to every part of a scene without any specific direction, and so provides light but not shadow; it has no real source.  Spotlights emit from a single source in a cone shape with an inner cone angle that provides full brightness and an outer cone angle that allow softening at the edges of where the light is falling; these are often used for torches. Point lights are much like real-world lightbulbs or candles, they emit from a single point in all directions.

 

Each of these light sources will provide a different type of direct lighting and their effect is computed by the rendering engine. However, to simulate the physics of lighting in the real world it is important to also calculate the indirect lighting, or global illumination, of a scene. Global illumination takes into account the way in which light travels from its source, hits an object and is then absorbed, reflected, scattered or refracted across every subsequent surface it encounters. It is perfect for producing the kind of style required by architectural visualization, interior renders, scenes with direct sunlight and photorealistic renders thanks to its calculation of indirect lighting, soft shadows, colour bleeding and more. For example, light reflecting off a red leather seat cushion will “bleed” colour onto the wall next to it – depending on the colour of the wall this could produce a reddish glow (if wall is white), or purple (if wall is blue). Alternatively, a great effect is where light can leak from one room to its neighbour, gently illuminating the new room through just an open crack in a door.

 

Bear in mind that an exact simulation of how light works is not necessarily required. All that’s needed is something that is good enough to fool the (admittedly, clever) human eye.

 

Solving the global illumination challenge

 

One option for achieving global illumination in a scene is an offline lightmap bake. This gives the illusion that light is being cast onto an object, but what you’re actually seeing is just the effect of the light baked onto the texture. This technique delivers high quality results, but the iteration time is slow and it has limited runtime possibilities – for example the baked light won’t have any effect on moving objects, not can it be turned on or off during play. Another technique is “bounce lighting”, where artists add light sources into the game at strategic positions in order to simulate global illumination – for example at the point where a light would be reflected, a new light source is added with the desired properties. In comparison, this has a fast iteration time, but it can take a very high number of iterations to achieve physical correctness, it is hard to achieve dynamism and the number of light sources may be limited by the engine in use.

 

Enlighten is a third option for achieving accurate, lightweight and dynamic global illumination. Enlighten uses real time radiosity to compute the interaction between a scene’s geometry and light. It contains a unique and highly optimised runtime library that generates lightmaps of bounce lighting in real time. The lightmap generation occurs on the CPU and is simply added to the rest of the direct lighting on the GPU. This approach can be further combined with lightmaps generated offline, so only the lights and materials that need to be updated at run time incur any cost. In this way, Enlighten offers a highly scalable solution suitable for all gaming platforms, from PC and console right the way down to mobile, and all lighting requirements, from fully baked to totally dynamic. Because the scene’s lighting and materials are also able to be updated dynamically at runtime in the editor (as well as in the game), rapid iteration is possible. By taking into account the indirect light, surface properties, and specularity in a scene it generates an extremely high quality and realistic output. For example, by enabling the bounce lighting to pick up the colour properties of the surfaces in the scene, Enlighten naturally ties together the geometry and lighting in an environment. In addition its ability to update materials’ properties at runtime create a host of new gameplay opportunities, as demonstrated with the Subway demo where destruction was achieved by making walls transparent.



More information on Enlighten is available at www.geomerics.com.



Introduction

 

Game developers are regularly looking for efficient methods for implementing stunning effects in their games. This is especially important when targeting mobile platforms as resources should be carefully balanced to achieve maximum performance.

 

When developing the Ice Cave demo, we researched in-depth the concept of local cubemaps. In a previous blog, I wrote about how implementing reflections based on local cubemaps has proven to be a very efficient technique for rendering high quality reflections. This blog is devoted to a new technique developed by the ARM demo team to render refractions also based on local cubemaps.

 

Refractions, what is it?

 

Refraction is an important effect to consider when striving for extra realism when rendering semi-transparent geometry.

 

Refraction is the change in direction of a wave due to a change in the transmission medium. It is essentially a surface phenomenon. The refractive index determines how much light is bent, or refracted, when entering a material. Snell’s Law establishes the relationship between the refractive indices and the sine of the incident and refracted angles, as shown in Figure 1.

 

SnellsLaw.png
  Figure 1. Refraction of light as it passes through one medium to another.

 

Refraction implementations


Developers have tried to render refraction since the very moment they started to render reflections, since these two physical processes take place together in any semi-transparent surface. There are several well-known techniques for rendering reflections but is not the case for refractions.

 


Existing methods for implementing refraction at runtime (ray tracing is excluded due to its complexity) differ depending on the specific type of refraction. Nevertheless, most of the techniques render to texture, the scene behind the refractive object at runtime and apply a non-physically based distortion in a second pass to achieve the “refracted look”. This approach, which varies in the way the texture perturbation is performed, is used to render the refraction that takes place in water, heat haze and glass objects, among other effects.

 

 

Although some of these techniques can achieve credible results, texture perturbation is not physically based and the results are not always correct.  If a realistic refraction is intended by rendering to texture from the point of view of the “refraction camera”, there may be areas that are not directly visible to the camera but become visible via refraction. Nevertheless, the main limitation of runtime render-to-texture methods, beside the physical correctness and performance penalty, is the quality, as there is often pixel shimmering or pixel instability that it is easily perceived while the camera is moving.

 


The use of static cubemaps to implement refraction is not new. Since the very moment when cubemaps became available in 1999, developers have used the cubemaps to implement reflections as well as refractions. When using cubemaps to implement reflections in a local environment, if we don’t apply the local correction we get incorrect results. This is also true for refractions.

 

Refractions based on local cubemaps


We bake into a static cubemap the environment surrounding the refractive object and fetch the texel from the cubemap based on the direction of the refracted vector (after applying the local correction, see Figure 2).

 

RefractionLocalCorrectionBoundingBox_3d.png
Figure 2. The local correction to refraction vector.


We apply the local correction in the same way we did with reflections in a previous blog. After determining the direction of the refracted vector, we need to find where it intersects the bounding box that delimits the volume of the local scene. The next step is to build a new vector from the position where the cubemap was generated to the intersection point and use this final vector to fetch the texel from the cubemap to render what is behind the refractive object. We get a physically based refraction as the direction of the refraction vector is calculated according to Snell’s Law. Moreover, there is a built-in function we can use in our shader to find the refraction vector R strictly according to this law:

 

R = refract( I, N, eta);

 

where I is the normalized view or incident vector, N is the normalized normal vector, eta is the ratio of indices of refractions (n1/n2).

 

Shader implementation


For the simple case of a thin refractive surface, the shader implementation is straightforward, as shown in Figure 3. As for reflections, to apply the local correction in the fragment shader we need to pass the position where the cubemap was generated, as well as the minimum and maximum bounds of the bounding box (all in world coordinates).

 

ShaderImplementation01.png
Figure 3. Shader implementations of refraction based on local cubemap.


Once we fetch the texel corresponding to the locally-corrected refraction direction, we might want to combine the refraction colour with other lighting, for example, reflections that in most cases take place simultaneously with refraction. In this case, we just need to pass an additional view vector to the fragment shader, apply to it the local correction and use the result to fetch the reflection colour from the same cubemap. Below is a code snippet showing how reflection and refraction might be combined to produce a final output colour.

CodeLines.png

 

A coefficient _ReflAmount, which is passed as a uniform to the fragment shader is used to adjust the balance between reflection and refraction contributions.  You can use ReflAmount to tweak manually the visual effect for the look you are trying to achieve. You can find the implementation of the LocalCorrect function in the reflections blog. When the refractive geometry is a hollow object refractions and reflections take place in both the front and back surfaces (as shown in Figure 4.) In this case, we need to perform two rendering passes.

 

BishopRefraction_FirstAndSecondPass.png

Figure 4. Refraction on a glass bishop based on local cubemap.  Left: First pass renders only back faces with local  refraction  and

reflections.  Right: Second pass renders only front faces with local refraction and reflections and alpha  blending with the first pass.

 

In the first pass, we render the semi-transparent object as we would opaque geometry. Additionally, in this pass, we render the object last with front-face culling on, i.e. to avoid occluding other objects, we render only the back faces and no depth buffer writing. The colour of the back face is obtained by mixing the colours calculated from the reflection, refraction and diffuse colour of the object itself.


In the second pass, we render only the front faces (back face culling), again last in the rendering queue and with depth writing off. The front-face colour is obtained by mixing the refraction and reflection textures with the diffuse colour. In this final pass we alpha-blend the resulting colour with the previous pass. Notice the combination of environment refractions and reflections.in both pictures from Figure 4.

 

 

The refraction in the second pass will add more realism to the final rendering but we could skip this step if the refraction on the back faces is enough to highlight the effect.


Figure 5 shows the result of implementing refractions based on local cubemap on a semi-transparent phoenix in the Ice Cave demo.

refract01_small.pngrefract03_small.png
Figure 5. Refractions based on local cubemaps in the Ice Cave demo.

Preparing the cubemap


Preparing the cubemap for use in the refraction implementation is a simple process. We just need to place a camera in the centre of the refractive geometry and render the surrounding static environment to a cubemap in the six directions. During the rendering process, the refractive object is hidden. This cubemap can then be used for implementing both refraction and reflection. The ARM Guide To Unity contains a simple script for rendering a cubemap.

 

 

Conclusions


Local cubemaps have proven to be an excellent technique for rendering reflections. In this blog I have shown how to use local cubemaps to implement very optimized and high quality refractions that can be combined at runtime with reflections to achieve high quality visual results (as seen in the Ice  Cave demo.) This is especially important when developing graphics for mobile devices where runtime resources must be carefully balanced. Nevertheless, this technique does have limitations due to the static nature of the cubemap. How to deal with refractions when the environment is dynamic will be the subject of another blog.

 

I knew my life would be based around the game industry since I was 10, but I never had the opportunity to dive deeper into the industry until I first went to GDC and saw the exciting things that people are working on.

Flying back on the plane with many ideas spinning in my head, I was wondering how I could engage more with the industry and help game developers achieve better performance and improved visual quality.

A few days later, I still couldn’t sleep as ideas were spinning in my head. I imagined being in a windowed room with the sun outside travelling across the sky. Then I thought, if the room is static geometry, why do I need to render it to the shadow map every frame? The things which are moving around are everything but the room: the sun, the camera, the dynamic objects etc.  I realized that a texture could represent the whole static environment (the room in my case). Going a bit further, the alpha channel could represent how much light can enter the room. My initial assumption was that a cubemap texture would only fit a uniform  room but, as we later found out, cubemaps seem to be very good approximation of many kinds of local environment (not only square rooms but also irregular shapes such as the cave we used in the Ice Cave demo which you can see below).

IceCave-01.jpgIceCave-02.jpg

IceCave-03.jpg

With the whole room represented by a cube texture I could access arbitrary texels of the environment from within fragment shader. With that in mind, the sun could be in any arbitrary position and the amount of light reaching a fragment calculated based on the value fetched from the cubemap.

I shared the theory with my ARM colleague, Roberto Lopez Mendez, and within a couple of hours we had a working prototype running in Unity. We were very pleased to see the prototype achieve very efficient soft shadows while maintaining a high level of visual quality and we decided to use this technique in the Ice Cave demo which was premiered at GDC 2015.

 

Is the video playback not good enough? Please see this link to fix the issue.

 

I only recently got back from staffing the ARM booth at GDC and I had a fantastic time meeting so many interesting and knowledgeable people in the game industry. I was really delighted to be able to show the Ice Cave demo and explain all the visual effects we used. The shadow technique in particular garnered a lot of interest with many developers keen to use it in their own engine.

 

Overview about shadow technique

In this blog I would like to give a technical overview of what you need to do in order to make the shadow technique work in your own projects. If you are not familiar with the reflections based on local cubemaps technique, I recommend reading this blog. There are also many other internet resources on the subject.

Why did I mention reflections based on local cubemaps? The main reason is that once you have implemented these reflections you are almost there with the shadow technique. The only additional thing you need to do alongside generating a reflection cube is to store alpha along with the RGB colours.

room-drawings.jpgdraw-cubemap.jpg

The alpha channel (transparency) represents the amount of light entering the local environment. In your scene, attach the cubemap texture to the fragment shaders that are rendering the static and dynamic objects on which you want to apply shadows. In the fragment shaders, build a vector from the current fragment to the light position in world space. As long as we are using local cubemaps we cannot use that vector directly to fetch a vector and we need just one more calculation step which is a local correction. In order to do that we need to calculate the intersection of that vector with the bounding volume (the bounding box of the room/environment) and then we need to build another vector from the position where the cubemap has been generated to the intersection point.

 

drawing.png

 

This vector (Q and P) can now be used to fetch a texel from the cubemap. Once you have fetched the texel you will have information about the amount of shadow/light to be applied on the fragment which is being processed. It is as simple as that.

That was a short summary of the technique, below you will find more details and a step by step guide of what you need to do in order to make shadows look really stunning in your application.

 

Generating shadow cubemaps

If you are already familiar with reflections based on local cubemaps, this step is exactly the same as creating the reflection cubemap (probe).  Moreover you can reuse your reflection cubemap for this shadow technique. You just need to add an alpha channel.

Let’s assume you have a local environment within which you want to apply shadows from the light sources outside of the local environment e.g. room, cave, cage, etc. As an aside, the environment does not have to be enclosed, it can be an open space, but this is a subject for another blog. Let’s focus on the local environment, a simple room, in order to understand better how the technique works.

You need to work out the position from which you will render six faces of the cubemap – in most cases it will be the centre of the environment’s bounding volume (a box). You will require this position not only for the generation of the cubemap, but later the position needs to be passed to shaders in order to calculate a local correction vector to fetch the right texel from the cubemap.

Once you have decided where to position the centre of the cubemap you can render all faces to the cubemap texture and record the transparency (alpha) of the local environment. The more transparent an area is, the more light will come into the environment. If required, you can use the RGB channels to store the colour of the environment for coloured shadows like stained glass, reflections, refractions etc.

 

Rendering shadows

Applying shadows to the geometry couldn’t be simpler. All you need to do is build a vector “L” in world space from a vertex/fragment to the light(s) and fetch the cubemap shadow by using this vector.

However there is tiny step you need to do with the “L” vector before fetching each texel. You need to apply local correction to the vector. It is recommended to make the local correction in the fragment shader to obtain more precise shadows. In order to do this you need to calculate the intersection point with the bounding volume (bounding box) of the environment and use this intersection point to build another vector from the cubemap origin position to the intersection point. This gives you the final “Lp” vector which should be used to fetch the texel.

 

Input parameters:

  • _EnviCubeMapPos – the cubemap origin position
  • _BBoxMax – the bounding volume (bounding box) of the environment
  • _BBoxMin – the bounding volume (bounding box) of the environment
  • V – the vertex/fragment position in world space
  • L – the normalized vertex-to-light vector in world space

 

Output value:

  • Lp – the corrected vertex-to-light vector which needs to be used to fetch a texel from the shadow cubemap.

 

An example code snippet which you can use to correct the vector:

 

// Working in World Coordinate System.

vec3 intersectMaxPointPlanes = (_BBoxMax - V) / L;

vec3 intersectMinPointPlanes = (_BBoxMin - V) / L;

 

// Looking only for intersections in the forward direction of the ray.    

vec3 largestRayParams = max(intersectMaxPointPlanes, intersectMinPointPlanes);

 

// Smallest value of the ray parameters gives us the intersection.

float dist = min(min(largestRayParams.x, largestRayParams.y), largestRayParams.z);

 

// Find the position of the intersection point.

vec3 intersectPositionWS = V + L * dist;

 

// Get the local corrected vector.

Lp = intersectPositionWS - _EnviCubeMapPos;

 

Then use the “Lp” vector to fetch a texel from the cubemap. The texel’s alpha channel [0..1] provides information about how much light (or shadow) you need to apply for a given fragment.

 

float shadow = texCUBE(cubemap, Lp).a;

 

At this point you should have working shadows in your scene. There are then two more minor steps to improve the quality of this shadowing.

Chessroom-01-hard.jpgChessroom-02-hard.jpg

 

Back faces in shadow

As you may have noticed, we are not using depth information to apply shadows and that may cause some faces to be incorrectly lit when in fact they should be in shadow. The problem only occurs when a surface is facing in the opposite direction to the light. In order to fix this problem you simply need to check the angle between the normal vector and the vertex-to-light vector, L. If the angle, in degrees, is out of the range -90 to 90, the surface is in shadow. Below is a code snippet which will do this check.

 

if (dot(L,N) < 0)

  shadow = 0.0;

 

As always there is room for improvement. The above code will cause each triangle into a hard switch from light to shade, which we would like to avoid. We need a smooth transition which can be done with the following simple formula:

 

shadow *= max(dot(L, N), 0.0);

 

shadow – alpha value fetched from the shadow cubemap

L – the vertex-to-light vector in world space

N – the normal vector of the surface, also in world space

 

Smoothness

The last, and I would say coolest, feature of this shadow technique is its softness. If you do it right you will get realistic shadow softness in your scene.

First of all make sure you generate mipmaps for the cubemap texture and use tri-linear filtering.

Then the only thing you need to do is measure the length of a vertex-to-intersection-point vector and multiply the length by the coefficient which you will have to customize to your scene. For example, in the Ice Cave project we set the coefficient to 0.08. The coefficient is nothing but a normalizer of a maximum distance in your environment to the number of mipmap levels. If you want you can calculate it automatically against bounding volume and mipmap levels. We found it very useful to have manual control to allow us to tweak the settings to suit the environment, which helped us to improve the visual quality even further.

Let’s get to the nitty-gritty then. Here we can reuse computations which have been done in calculating local correction. We can reuse the intersection point to build the vertex-to-intersection-point vector and then we need to calculate the distance.

 

float texLod = length(IntersectPositionWS - V);

 

Then we multiply the lodDistance by the distance coefficient:

 

texLod *= distanceCoefficient;

 

To implement softness, we must fetch the correct mipmap level of the texture by using texCUBElod (Unity) or textureLod (GLSL) function and construct a vec4 where XYZ represents a direction vector and the W component represents LOD (Level of Detail).

 

Lp.w = texLod;

shadow = texCUBElod(cubemap, Lp).a;

 

Chessroom-01-soft.jpgChessroom-02-soft.jpg

 

Now you should see high quality smooth shadows in your scene.

 

Combine cubemap shadows with shadowmap

In order to get the full experience you will need to combine cubemap shadows with the traditional shadow map technique. Even if it sounds like more work it is still worth it as you will only need to render dynamic objects to the shadow map. In the Ice Cave demo, we simply added the two shadow results in the fragment shader to get our final shadow value.

 

Chessroom-03-no-shadowmap.jpgChessroom-03-with-shadowmap.jpg
Without Shadow MapWith Shadow Map

 

 

Statistics

In traditional techniques, rendering shadows can be quite expensive as it involves rendering the whole scene from the perspective of each shadow-casting light source. The technique described in this blog is mostly prebaked (for improved performance) and independent of output-resolution, producing the same visual quality for 1080p as it produces for 720p and other resolutions.

The softness filtration is calculated in hardware (via the texture pipeline) so the smoothness comes almost for free. In fact, the smoother the shadow the more efficient the technique is. This is due to the smaller mipmap levels which result in less data transferred from main memory to the GPU compared to  traditional shadow map techniques which require a large kernel to make shadows smooth enough to be more visually appealing. This obviously causes a lot more data traffic and reduces performance.

The quality you get with the shadow technique is even higher than you may expect. Apart from realistic softness, you have very stable shadows with no more shimmering on the edges. Shimmering edges can be observed when using traditional shadow map techniques due to rasterization and aliasing. None of the anti-aliasing algorithms can fix this problem entirely. Obviously enabling multi-sampling helps to improve quality, but the shimmering effect is still visible. The cubemap shadow technique is free from this problem. The edges are stable even if you use a much lower resolution than is used in the render target. You can easily use four times lower resolution than the output and see neither artefacts nor unwanted shimmering. Needless to say that having four times lower resolution saves massively on bandwidth and improves performance!

Let’s get to the point with performance data. I already covered the texture fetch above which is very efficient. Apart from fetching a texel you only need to make few calculations related to the local correction. The local correction on ARM® Mali™ GPU-based devices does not take more than three cycles per fragment. This perfectly aligns with the texture fetch pipeline and while the GPU is calculating the next vector for the cubemap, the texture pipeline is preparing the texel for the current fragment.

 

Conclusion

This technique can be used on any device on the market which supports shaders i.e. OpenGL® ES 2.0 and higher. But as everything else in our world it has its downsides. The technique cannot be used for everything in your scene. Dynamic objects, for instance, receive shadows from the cubemap but they cannot be prebaked to the cubemap texture. The dynamic objects should use shadow maps for generating shadows, blended with the cubemap shadow technique.

As a final word we are seeing a lot of implementations of reflections based on local cubemaps and this shadow technique is based on more or less the same paradigm. If you already know where and when to use the reflections based on local cubemaps technique, then you will be able to easily apply the shadow technique to your implementation.

More than that, you will find the shadow technique is less error prone when it comes to local correction. Our brains do not spot so many errors with shadows as we tend to spot in reflections. Please have a look at our latest demo, Ice Cave, and try to implement the technique in your own projects.

 

I hope you enjoy using it!

Gemma Paris ARM is hosting Graphics Week in the ARM® Connected Community. This is a roundup of the tools and resources to help developers Promoting_Mali_Developer_Centre_940x380px.jpg
get the most out of the latest hardware,
along with proven tools to debug and optimize their apps and techniques for producing
high-quality visuals on mobile platforms.

 

During Graphics Week, we will share blogs and videos of tools that will help game developers simplify the development process
and deliver console quality content to mobile platforms. Topics we will cover include:

  • Compute Shaders
  • Shadows Based on Local Cubemaps
  • Ice Cave Demo with Unity 5 and Enlighten™
  • Updates to the Latest Tools such as ARM Mali™ Graphics Debugger
  • Lighting Mathematics by Geomerics an ARM company
  • General lighting and Games by Geomerics an ARM company

 

URL: cc.arm.com/graphicsweek

 

Related Blogs:

 

Related Videos:

With Mali Graphics Debugger you can edit OpenGL® ES shaders on the fly on your Android or Linux device while the game is still running. In fact, the tool will replay a frame over and over with modified shaders, so you can check the output on the display, or capture the frame for further inspection. This feature comes particularly useful if the output does not look quite like the one you expected, if you need to experiment with different color and alpha values for blending, or to develop post-processing effects.

 

Dynamic editing

This is different from static shader editing (or material editing), because with Mali Graphics Debugger you are not working on a single shader in isolation. Instead you are editing it in the context of the actual frame it will be used on, with all the actual assets, textures, post-processing effects and camera position.

 

Live shader editing demo

Here's a demonstration of live shader editing. In this video the Epic Citadel demo is captured with Mali Graphics Debugger and one of its shaders is being modified. Finally, a frame is replayed with the modified shader, to show its effect.

 

 

0:08 Capturing Epic Citadel

0:17 Enabling shader map mode, to see what shader is used to draw the sky

0:30 Shader 3, inside Program 1, is the one we are going to edit

0:41 We are multiplying the RGB values of the final color by (1, 0, 0), which means that we keep only the RED channel

0:50 Replay the frame with the modified shader

0:57 Capture the modified frame

 

vlcsnap-2015-04-10-11h09m39s168.pngvlcsnap-2015-04-10-11h09m24s109.png vlcsnap-2015-04-10-11h09m48s28.pngvlcsnap-2015-04-10-11h09m58s107.png

 

Additional information

Download the Mali Graphics Debugger and for more information: Mali Graphics Debugger - Mali Developer Center Mali Developer Center

You can find other videos about Mali Graphics Debugger in Tutorials: ARM Mali - YouTube and ARM - YouTube

 

Have you tried this yet? What do you think of it, and what would you like to see in the next version of Mali Graphics Debugger?

gdc15_logo.pngEvery year at GDC, we like to present some important updates regarding the development tools for game developers that target devices with ARM® Mali™ GPUs. In 2013,we previewed Mali Graphics Debugger v1.0, which was then released a few weeks later. Exactly one year later, at GDC 2014, we showcased v1.3, which included the brand new frame replay feature (see User Guide Section 5.2.10 for details), a new binary format for traces and many performance improvements. In the meantime, we had already implemented advanced features like Frame Capture, Shader Map, Overdraw Map, support for ASTC textures and shader statistics. Version 1.3 has been the most utilized version of that tool, supporting the Khronos APIs OpenGL® ES 1.1, 2.0 and 3.0, as well as EGL and OpenCL™.

 

citadel-frame-analysis3-scaled.gif

 

Mali Graphics Debugger has been extremely useful to a wide range of developers, from our internal GPU driver teams, to our silicon partners and OEMs, to game engines and games developers, and this is why GDC is such an important event for us.

This year at GDC 2015, we released version 2.1, based on the brand new version 2.0, released right at the end of last year. In the latest version we have made some major improvements to the tool including:

 

OpenGL ES 3.1 and Android Extension Pack support

Now Mali Graphics Debugger can trace all the functions that are supported in the Mali GPU drivers, and even more, to allow early support for some that are still being developed. This means that all OpenGL ES 3.1 function calls will be present in the trace, and most of the OpenGL ES extensions can be captured seamlessly.

OpenGL ES 3.1 adds support for features like compute shaders, which is a flexible way to manipulate general purpose buffers using the GPU, so that workload can be moved from the application processor to the graphics one. Other features of OpenGL ES 3.1 are indirect draw calls, to allow the GPU to manage the draw calls rather than doing it on the CPU and enhanced texture features like offscreen multisampling. The extensions included in the Android Extension Pack support geometry and tessellation shaders, in addition to the ASTC texture compression format and many other features. OpenGL ES 3.1 and a selection of features of the Android Extension Pack are now supported in Mali Graphics Debugger and in our Mali OpenGL ES emulator.


TessellationResult.png

 

Support for Android 64-bit

(Or technically, ARMv8-A AARCH64 devices)

 

Android 5.0 introduces platform support for 64-bit architectures, including ARMv8-A devices. We have ported the Mali Graphics Debugger target components to 64-bit architectures, and we have extensively tested it on our Juno ARM Development Platform (getting started), which is equipped with ARM Cortex®-A57 and Cortex-A53 MPCore™ CPUs for ARMv8-A big.LITTLE™  processing and a Mali™-T624 GPU for 3D graphics acceleration and compute. This has been particularly useful to have to port the Epic Games’ Moon Temple demo to 64-bit. Now it is available to everyone, and we are looking forward to trying it on the brand new Samsung Galaxy S6 phones.

 

Live editing is becoming even more powerful

Mali Graphics Debugger allows users to edit shaders, override textures and precision while capturing an application. This is done by replaying the same frame, with modified assets, over and over on the target device.

With version 2.1 you can now:

  • Change both the fragment and vertex shader of a program and replay the frame to view the results.
  • Override textures in an application and replace them with a new texture that will aid in diagnosing any issues with incorrect texture coordinates.
  • Override the precision of all elements in a shader and replay the frame to view the results (force highp/mediump/lowp modes).

 

New Android application provided to support unrooted devices

With the objective of making the installation of the graphics debugger on Android targets easier, we have developed an Android application that runs the required daemon. This eliminates the need to manually install executables on the Android device. The application (APK) works on rooted and unrooted devices.

mgdapk.png

 

New features for GPU compute

Mali GPUs don't just render graphics, but they also support general purpose computing, which can be done with compute shaders in OpenGL ES or OpenCL, depending on the use case. In this version, we have a new view for compute shaders, displaying the same shader statistics as the vertex and fragment shaders, which can be very useful for optimizing them and finding bottlenecks.

 

For OpenCL developers we have also added support for GPUVerify, a tool for formal analysis of GPU kernels written in OpenCL.

GPUVerify was originally designed by Alastair Donaldson (Imperial College London), and has been supported by ARM, among other partners. Read the detailed paper here.

 

Availability and support

As always, tools provided by ARM are supported in the ARM Connected Community. You can ask a question in the Mali Developer Forums, follow us on Twitter, Sina Weibo, or watch our YouTube, YouKu channels.

Hi everyone,

 

Just wanted to write a quick blog with the news that the PLAYHACK with ARM competition that launched on the ARM booth at GDC 2015 has finished! The competition ran throughout March and was to create the best WebGL game using the PlayCanvas engine and the ARM buggy asset as seen below. The prize won is a Chromebook 2 13.3" full specs at the end.

 

The winning game is Space Buggy by lmao, head over to the PlayCanvas blog to see the full announcement and honourable mentions to the runners up.

 

Space Buggy animationbuggy_600.jpg

 

PLAY SPACE BUGGY HERE


You can also check out the SeeMore WebGL demo which runs in the Mali Developer Center see the screenshot below:


SeeMore WebGL demo by ARM and PlayCanvas










Samsung Chromebook 2 13.3”

Chromebook2133.jpg

  • Samsung Exynos 5 Octa (5800)
  • ARM® Mali™-T628 MP6 GPU
  • ARM Cortex®-A15 MP4 and Cortex-A7 MP4 CPUs
  • ARM big.LITTLE™ processing

Filter Blog

By date:
By tag: