Vulkan is a low level rendering API which exposes the hardware more directly to the application than the earlier APIs such as OpenGL ES. This enables a much lighter driver, reducing CPU load and improving energy efficiency, but in return places responsibility on the application to make the best use of the underlying hardware because the driver has less visibility and fewer behavioral guarantees which would allow it to transparently optimize the applications’ hardware usage.
This blog documents the recommended application usage of texture and sampler descriptors to get the best performance out of the current Mali Bifrost GPUs.
The current Bifrost GPUs use variable sized caches to store texture and sampler descriptors in the texturing unit. Each descriptor is classified as either “compact” or “full” depending on the settings it contains, and the hardware cache can contain 16 compact entries and 8 full sized entries. Application usage which maps to full sized entries will have fewer cache entries available, and will therefore be more prone losing performance due to cache pressure.
In OpenGL ES the API specifies the defaults for many parameters which will map to “compact” entries unless overridden by the application. However, for Vulkan the application specifies all of the descriptor settings and it is important that it uses values which map to compact settings to get access to the maximum capacity of the descriptor cache.
Due to the potential impact of the small full descriptor cache, in particular on Vulkan content, new IP releases of the impacted GPUs provide a 24 entry cache size, irrespective of content of the descriptors.
GPU Product
Impacted Releases
Patched Release
Mali-G71
r0p0
r0p1
Mali-G51
r0p0-r0p1, r1p0
r1p1
Mali-G72
r0p0-r0p2
r0p3
Applications using Vulkan are responsible for supplying all parameters for the texture and sampler descriptors themselves - there are no safe API-specified defaults - which means that applications need to supply parameters which map to the Mali compact samplers as often as possible.
To qualify for the compact sampler descriptor optimization all of the following constraints should be followed by the application when populating the VkSamplerCreateInfo structure:
It should be noted that the requirements for compact samplers conflict with the Vulkan specification's recommended approach for emulating GL_NEAREST (no filtering on samples read from mip 0) and GL_LINEAR (bilinear filtering on samples from mip 0) sampling for mipmapped textures.
There are no Vulkan filter modes that directly correspond to OpenGL minification filters of GL_LINEAR or GL_NEAREST, but they can be emulated using VK_SAMPLER_MIPMAP_MODE_NEAREST, minLod = 0, and maxLod = 0.25, and using minFilter = VK_FILTER_LINEAR or minFilter = VK_FILTER_NEAREST, respectively.
To emulate these two texture filtering modes for a texture with multiple mipmaps levels, while also being compatible with the requirements for compact samplers, use the following recommendation.
Note: Direct access to textures through imageLoad() and imageStore() in shader programs (or equivalent in SPIR-V) are not impacted by this issue.
To qualify for the compact texture descriptor optimization the following constraints should be followed by the application when populating the VkImageViewCreateInfo structure:
Applications using the OpenGL ES API will use descriptors which are populated with default values defined in the API specification. These defaults will map to the compact descriptor entry types, unless settings are explicitly overridden by the application to values which are not compatible with the compact descriptor requirements. Also, as the pairing of texture and sampler is known at draw time, the driver can specialize sampler descriptor settings given to the GPU based on the actual texture in use which allows compact samplers to be used more often.
The following settings of texture and/or sampler objects should be used to ensure use of compact samplers:
Note: Direct access to textures through imageLoad() and imageStore() in shader programs are not impacted by this issue.
It should be noted that this best practice aims to get the best use out of a hardware cache. Applications are able to use small numbers of textures and samplers which use full sized cache entries without any performance impact – the cache is smaller, not non-existent – so if using swizzles and LOD clamps is needed to correctly implement a rendering algorithm then don’t be afraid to use them. However, where it is possible to transparently substitute the recommended compact settings, such as substituting 1000.0 rather than using the maximum mipmap level present in a texture for maxLod in the sampler descriptor, it is highly recommended that you do so.