Mali Bifrost Usage Recommendations for Texture and Sampler Descriptors

May 3, 2017

4 minute read time.

Vulkan is a low level rendering API which exposes the hardware more directly to the application than the earlier APIs such as OpenGL ES. This enables a much lighter driver, reducing CPU load and improving energy efficiency, but in return places responsibility on the application to make the best use of the underlying hardware because the driver has less visibility and fewer behavioral guarantees which would allow it to transparently optimize the applications’ hardware usage.

This blog documents the recommended application usage of texture and sampler descriptors to get the best performance out of the current Mali Bifrost GPUs.

Hardware behavior

The current Bifrost GPUs use variable sized caches to store texture and sampler descriptors in the texturing unit. Each descriptor is classified as either “compact” or “full” depending on the settings it contains, and the hardware cache can contain 16 compact entries and 8 full sized entries. Application usage which maps to full sized entries will have fewer cache entries available, and will therefore be more prone losing performance due to cache pressure.

In OpenGL ES the API specifies the defaults for many parameters which will map to “compact” entries unless overridden by the application. However, for Vulkan the application specifies all of the descriptor settings and it is important that it uses values which map to compact settings to get access to the maximum capacity of the descriptor cache.

Impacted GPU releases

Due to the potential impact of the small full descriptor cache, in particular on Vulkan content, new IP releases of the impacted GPUs provide a 24 entry cache size, irrespective of content of the descriptors.

GPU Product	Impacted Releases	Patched Release
Mali-G71	r0p0	r0p1
Mali-G51	r0p0-r0p1, r1p0	r1p1
Mali-G72	r0p0-r0p2	r0p3

Application best practice for Vulkan

Applications using Vulkan are responsible for supplying all parameters for the texture and sampler descriptors themselves - there are no safe API-specified defaults - which means that applications need to supply parameters which map to the Mali compact samplers as often as possible.

Sampler descriptor settings

To qualify for the compact sampler descriptor optimization all of the following constraints should be followed by the application when populating the VkSamplerCreateInfo structure:

Set sampler addressMode(U|V|W) so they are all the same
- Note that addressModeW must be set to be the same as U and V even when sampling a 2D texture
Set sampler mipLodBias to 0.0
Set sampler minLod to 0.0
Set sampler maxLod to 1000.0
Set sampler anisotropyEnable to VK_FALSE
Set sampler maxAnisotropy to 1.0
Set sampler borderColor to VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK
Set sampler unnormalizedCoordinates to VK_FALSE

It should be noted that the requirements for compact samplers conflict with the Vulkan specification's recommended approach for emulating GL_NEAREST (no filtering on samples read from mip 0) and GL_LINEAR (bilinear filtering on samples from mip 0) sampling for mipmapped textures.

There are no Vulkan filter modes that directly correspond to OpenGL minification filters of GL_LINEAR or GL_NEAREST, but they can be emulated using VK_SAMPLER_MIPMAP_MODE_NEAREST, minLod = 0, and maxLod = 0.25, and using minFilter = VK_FILTER_LINEAR or minFilter = VK_FILTER_NEAREST, respectively.

To emulate these two texture filtering modes for a texture with multiple mipmaps levels, while also being compatible with the requirements for compact samplers, use the following recommendation.

Use a VkImageView instance which references only the level 0 mipmap by setting baseMipLevel to 0 and levelCount to 1.
Use a VkSampler with pCreateInfo.maxLod setting to 1000.0 in accordance with the compact sampler restrictions.

Note: Direct access to textures through imageLoad() and imageStore() in shader programs (or equivalent in SPIR-V) are not impacted by this issue.

Texture descriptor settings

To qualify for the compact texture descriptor optimization the following constraints should be followed by the application when populating the VkImageViewCreateInfo structure:

Set all fields in view components to either VK_COMPONENT_SWIZZLE_IDENTITY or the explicit per-channel identity mapping equivalent
Set view subresourceRange.baseMipLevel to 0

Application best practice for OpenGL ES

Applications using the OpenGL ES API will use descriptors which are populated with default values defined in the API specification. These defaults will map to the compact descriptor entry types, unless settings are explicitly overridden by the application to values which are not compatible with the compact descriptor requirements. Also, as the pairing of texture and sampler is known at draw time, the driver can specialize sampler descriptor settings given to the GPU based on the actual texture in use which allows compact samplers to be used more often.

The following settings of texture and/or sampler objects should be used to ensure use of compact samplers:

Set GL_TEXTURE_WRAP_(S|T|R) to identical values
- Note that the GL driver can specialize the sampler state based on the current texture so, unlike Vulkan, there is no need to set GL_TEXTURE_WRAP_R for 2D textures
Do not use GL_CLAMP_TO_BORDER
Set GL_TEXTURE_MIN_LOD to -1000.0 (default)
Set GL_TEXTURE_MAX_LOD to +1000.0 (default)
Set GL_TEXTURE_BASE_LEVEL to 0 (default)
Set TEXTURE_SWIZZLE_R to GL_RED (default)
Set TEXTURE_SWIZZLE_G to GL_GREEN (default)
Set TEXTURE_SWIZZLE_B to GL_BLUE (default)
Set TEXTURE_SWIZZLE_A to GL_ALPHA (default)
Set GL_TEXTURE_MAX_ANISOTROPY_EXT to 1.0 if the EXT_texture_filter_anisotropic filtering extension is available

Note: Direct access to textures through imageLoad() and imageStore() in shader programs are not impacted by this issue.

Reasonable usage of full descriptors

It should be noted that this best practice aims to get the best use out of a hardware cache. Applications are able to use small numbers of textures and samplers which use full sized cache entries without any performance impact – the cache is smaller, not non-existent – so if using swizzles and LOD clamps is needed to correctly implement a rendering algorithm then don’t be afraid to use them. However, where it is possible to transparently substitute the recommended compact settings, such as substituting 1000.0 rather than using the maximum mipmap level present in a texture for maxLod in the sampler descriptor, it is highly recommended that you do so.

Graphics, Gaming, and VR blog

Introducing Arm Accuracy Super Resolution

arm-phodges

Today we introduce “Arm Accuracy Super Resolution” (Arm ASR), which is a best-in-class open-source solution for upscaling on mobile devices.
- July 10, 2024
Getting started with Android Dynamic Performance Framework (ADPF) in Unreal Engine

Syed Farhan Hassan

For research purposes, Arm has developed a demo using Unreal Engine and Android Dynamic Performance Framework (ADPF) to investigate how ADPF is used to optimize game performance.
- July 4, 2024
NanoMesh on Mobile: Delivering great beauty in simplicity

Nathan Li

From the GDC24 tech talk “SmartGI Evolution: Adaptive NanoMesh on Mobile”. SmartGI and NanoMesh are cutting-edge rendering solutions aiming to enable the best possible graphics on all platforms.
- May 28, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog