Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
Graphics, Gaming, and VR blog astcenc 3.1: high performance texture compression
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Mali GPU Tools
  • Adaptive Scalable Texture Compression (ASTC)
  • Graphics and Gaming
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

astcenc 3.1: high performance texture compression

Peter Harris
Peter Harris
August 10, 2021
6 minute read time.

At SIGGRAPH last year we announced the release of astcenc 2.0, the first major update to the Arm ASTC texture compressor since the format was announced in 2012. This release gave developers a much needed performance boost, but we knew it was just the starting point on a longer journey. Here we are 12 months and 7 releases later, happy to announce the result of our work: astcenc 3.1.

More speed

For most developers, importing and compressing textures is one of the most time-consuming parts of a project build cycle. Fast compressors significantly improve developer efficiency and reduce iteration time, so it is no surprise that slow ASTC compression has been a major bugbear of developers for some time. The main goal of our work on the compressor has been to make the codec as fast as we could. However, we wanted to achieve this without sacrificing the image quality that astcenc provides, as this is one of the real strengths of the ASTC format.

The core codec has been extensively optimized, with nearly every path fine-tuned, and vectorized. Specific vectorized builds are now available for multiple CPU architectures:

  • Arm AArch64: Neon
  • x86-64: SSE2, SSE4.1, AVX2

The performance of the 3.1 release averages 5 times faster than the 2.0 release, and up to 17 times faster than the 1.7 release. The performance comes at a small image quality loss of around 0.1 dB compared to the 1.7 release for most block sizes. High-bitrate encodings, such as those using the 4x4 block size, actually improve image quality slightly despite the faster performance.

 astcenc 3.1 performance compared to astcenc 1.7

To put this in absolute terms the latest compressor, compiled for AVX2 and running on an Intel core i5-9600K at 4.2GHz, can compress LDR color images with the following performance:

  • 12M Texels/s for a fast search,
  • 3M Texels/s for a medium search,
  • 800K Texels/s for a thorough search.

The improvements in this release mean that thorough compression in astcenc 3.1 is around three times faster than medium compression in astcenc 1.7. It is now possible to get both better image quality and measurably faster compression passes. The performance of thorough compression is now good enough that it is feasible to get the best quality out of ASTC in real game builds, which was not possible in 1.7 due to the compression cost.

Fine-grained compression quality

One of the changes we have added this year is a more fine-grained control over the compressor’s performance-to-quality trade-off. Earlier releases supported only a set of defined preset quality levels (fastest, fast, medium, thorough, and exhaustive), each step increasing the compression cost by between 3 and 5 times. The presets are still supported, but it is now also possible to supply a numeric quality level between 0 (fastest) and 100 (exhaustive), giving developers the ability to finely control their compression cost.  

Handy hint: Running astcenc with a compression effort of 30, which is between the -fast (quality 10) and -medium (quality 60) levels, is a close match for the performance and image quality of the ISPC TexComp ASTC texture compressor.

New features

While performance is important, it is not the only thing that matters, so we have also added a number of new features to the codec based on common developer requests.

CPU ISA invariant output

The compressor output is now invariant across CPU architectures and compilers, giving bit-identical output for a given input image and compressor configuration. This gives developers certainty about what their game is going to look like, no matter which build machine was used to run the asset pipeline. It also makes it a lot easier to integrate the compressor into automated test environments which perform bit-exact image comparisons to determine whether a test passes or fails. 

RGBM compression

We have added support for the RGBM texture format, giving the compressor some awareness of RGBM and how it is consumed. RGBM is a container format which encodes a limited form of HDR data in an LDR wrapper, which can be converted back into HDR values using some shader arithmetic when the texture is consumed.

Handy hint: RGBM encoding is a means to support a limited form of HDR textures on GPUs without a native HDR texture format. All Mali GPUs that support ASTC implement the HDR feature profile, allowing HDR textures to be directly encoded for improved efficiency and image quality.

The traditional HDR-to-RGBM encode is:

// Load HDR inputs in range [0-5]
float r_in = pixel_in.r / 5.0f;
…

// Extract multiplier, rescale RGB to fit range [0-1]
float m_enc = max(r_in, g_in, b_in);

float r_enc = min(1.0f, r_in / m_enc);
…

The traditional shader RGBM-to-HDR reconstruction is:

// Load LDR inputs in range [0-1]
vec4 data = texture(…);

// Convert back to HDR in range [0-5]
data.rgb = data.rgb * data.a * 5.0;

RGBM has historically proven a challenge to compress well with ASTC, as the characteristics of RGBM data break a number of assumptions that the compressor makes about how texture data behaves. In particular, for dark pixels, the format will try to mix large RGB values and small M values. This leaves M prone to quantization during compression, which produces block artifacts caused by M-error-induced luminance shifts, and can often round to zero which results in completely black output blocks.

 RGBM image with block artifacts.

To solve this issue the original user code that converts HDR data into RGBM, prior to compression with astcenc, should be modified to clamp the value of M above 16. This prevents the use of the very small M values which cannot be encoded reliably.

// Load HDR inputs in range [0-5]
float r_in = pixel_in.r / 5.0f;
…

// Extract multiplier, but limit to >= 16
float m_enc = max(r_in, g_in, b_in, 16);

// Rescale RGB to fit range [0-1]
float r_enc = min(1.0f, r_in / m_enc);
…

The second RGBM-related change is that, during compression, astcenc can now compute the error in the decoded HDR domain rather than the encoded RGBM domain. This allows it to more accurately select block candidates with lower error. Used together these two changes give a much improved result:

 RGBM image without block artifacts.

Using RGBM with astcenc is therefore a two-step process:

  1. The user has the responsibility for applying the M limiting during the HDR-to-RGBM conversion, which is handled outside of the compressor.
  2. During compression use the -rgbm <max> command-line option, where <max> is the maximum HDR value used during reconstruction, to minimize the compression error. In our previous code snippets this is the value 5.

Sprite sheet transparent “RDO” compression

We have added a limited form of rate distortion optimization (RDO) compression for textures with completely transparent regions. The aim of this technique is to make the compressed texture itself more compressible, when stored inside a compressed distribution package such as an Android APK or OBB bundle.

This technique replaces zero alpha blocks that are surrounded entirely by other zero alpha blocks with a constant color block, irrespective of the original color values in the input image. This is safe, as the zero alpha means that the original color value is never actually needed, and means the impacted zero-alpha blocks all end up compressing to the same bit pattern in memory.

Original image:

 Original sprite sheet with alpha transparency

Original image ignoring transparency:

  Original sprite sheet showing extruded opaque edges

Compressed image, ignoring transparency, after swapping out candidate zero alpha blocks with constant color blocks:

  Original sprite sheet showing constant color blocks for zero alpha regions

For our previous test texture, replacing a typical opaque edge extrude with constant color blocks reduces the size of a the zipped compressed texture data by up to 20 percent.

Block Size

.astc KB

Old .astc.bz2 KB

New .astc.bz2 KB

Reduction in .bz2 size

4x4

684

602

482

19.9%

5x5

438

386

318

17.5%

6x6

304

270

224

17.1%

8x8

172

144

125

13.2%

The sprite-sheet RDO functionality is automatically enabled for when using the existing -a <radius> option to alpha-weight color encodings.

Download today

To learn more about the ASTC format and how best to use it please check out our ASTC guide.

Find astcenc on GitHub

Anonymous
Graphics, Gaming, and VR blog
  • Automated performance monitoring and more with Arm Mobile Studio 2022.4

    Julie Gaskin
    Julie Gaskin
    Here are some of the highlights from the latest release of Arm Mobile Studio, to support easier performance analysis for game developers.
    • January 30, 2023
  • Performance analysis with Arm Mobile Studio

    Julie Gaskin
    Julie Gaskin
    In part 3 of Arm’s Mali GPU training series, learn how to analyze the performance of a mobile game with Arm Mobile Studio, our free-to-use performance analysis tool suite.
    • December 5, 2022
  • Best practice principles for mobile game development

    Julie Gaskin
    Julie Gaskin
    Part 2 of Arm's free Mali GPU training for mobile graphics developers. Here, we present the latest best practice recommendations to get the best from devices with Mali GPUs.
    • November 26, 2022