With the advancement of the 5G, online video has become one of the primary mediums for people to obtain information, according to Ericsson, at the end of 2023, video traffic accounted for 73 percent of all mobile data traffic[1], with the viewing resolution of mobile internet terminals quickly jumping from the initial 360P, 480P, to 720P, 1080P, and even to the 4K/8K ultra-high-definition video. In addition to the demand of higher resolution, immersive video experiences represented by AR (Augmented Reality) and VR (Virtual Reality) are also increasingly demanding in terms of frame rate and color space. Each increase in resolution, frame rate, or color space results in an exponential growth in the amount of video data, making bandwidth resources even more valuable. Considering this context, video coding technology, as the cornerstone of the development of ultra-high-definition video has become increasingly important. Through advanced encoding and decoding technology, we can significantly reduce the bandwidth required for video transmission, which can not only effectively save costs but also greatly enhance the user's viewing experience.
To cope with the increasing video traffic, video standardization organizations (such as MPEG, ITU-T and ISO) have been continuously advancing the iterative upgrades of video coding technology, from MPEG-2, H.264/AVC, H.265/HEVC, to the latest H.266/VVC. Today, H.264/AVC is still the widely used codec standard, it can provide high-quality video streams at lower data rates, making it an ideal choice for online and mobile video platforms. H.265/HEVC further improves compression efficiency, theoretically achieving twice the compression efficiency of H.264 at the same video quality. This means that the bandwidth required for H.265/HEVC to transmit high-definition video streams and 4K videos is about half of H.264, this bandwidth saving is particularly important for the transmission of high-definition video streams and 4K videos. The latest H.266/VVC standard further improves compression efficiency more than 40% while maintaining the same video quality, which is of great significance for the transmission of 8K videos and future ultra-high-resolution video.
Video transcoding is the process of converting videos streams into different formats by changing video parameters such as codec settings, resolution, and bitrate, etc. to accommodate different networking connectivity conditions and user end devices. It has been widely employed to improve the user experience, and most importantly to reduce both storage space and network bandwidth costs. Although dedicated video acceleration cards (such as ASICs) and GPUs have shown excellent performance in specific video transcoding tasks, general purpose server CPUs are gradually becoming the preferred option in video transcoding scenarios due to their excellent flexibility and higher cost-performance ratio, especially in video-on-demand and live-streaming media. CPU based video transcoding deployment can greatly benefit from the rapidly evolving video codec algorithms innovation and continuous software optimization in both open source and in-house video codec libraries, providing a cost-effective solution compared to dedicated acceleration cards and GPUs.
In the past few years, Arm and ecosystem partners have been working closely with open source communities to enhance the video codec performance on Arm platform by employing Neon and SVE/SVE2 instruction sets supported on Arm Neoverse CPUs. In the recent x.265 release (version 4.0) [2], video codec on Arm has been optimized to deliver notable frames per second (FPS) improvements across different encoding presets which provides substantial performance boosts in multi-stream encoding situations.
Today, we are going to benchmark the latest x265 release (version 4.0) to compare the performance of video transcoding scenario on Neoverse N2 based Yitian710 and the same tier of x86 instances (Intel IceLake, Intel Sapphire Rapids, and AMD Genoa) with platform configuration as follows:
We started with performance scaling benchmarking between Yitian710 and the same tier x86 instances, as it is illustrated below, Neoverse N2 based Yitian710 shows much better scaling linearity in Frames Per Second(FPS) as the number of transcoding tasks increases from 1 to 16. Also, Yitian710 can achieve up to 2.2 times higher throughput than the same tier of x86 instances under 16 parallel transcoding tasks configuration.
Figure 1: x265 transcoding aggregate performance scaling by number of transcoding tasks
Figure 2: x265 transcoding performance comparison with 16 parallel transcoding tasks
It is also important to note that the Neoverse N2 based Yitian710 platform is much more cost effective than Intel Icelake, Intel Sapphire Rapids and AMD Genoa, which is reflected in the lower pricing of AliCloud Yitian710 instances above. This has given AliCloud Yitian710 a significant advantage here in terms of Total Cost of Ownership (TCO) for x.265 transcoding throughput per RMB, providing up to 3.3 times higher frames per RMB over same tier of x86 instances, which provides a compelling advantage to users looking to deploy video transcoding services in cloud.
Figure 3: x265 transcoding price performance comparison with 16 parallel transcoding tasks
As high-resolution streaming becomes more prevalent, there is an increasing demand for advanced compression codecs such as H.265 in cloud-based video streaming services. However, this enhanced compression technology requires more computational power and results in greater energy consumption. On the system-level, Arm Neoverse N2 based Alibaba Yitian710 provides better scaling and up to 2.2x higher transcoding performance, while offering up to 3.3x higher price performance compared to same tier of x86 instances (Intel Icelake, Intel Sapphire Rapids and AMD Genoa). Please visit Arm developer Hub[3] to learn how to build and run x265 on Arm servers to benefit from the outstanding transcoding performance and power efficiency on Arm Neoverse.