Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Accelerating Deep-packet-inspection (DPI) with Neon on Arm Neoverse platforms
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Software-defined network (SDN)
  • Cloud Computing
  • Neoverse N1
  • infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Accelerating Deep-packet-inspection (DPI) with Neon on Arm Neoverse platforms

Ravi Malhotra
Ravi Malhotra
October 6, 2021
3 minute read time.

Co-Authors: Ravi Malhotra and Jici Gao from Arm, with support from Konstantinos Margaritis from VectorCamp

This blog describes the work done by Arm and partners to accelerate regular expression parsing using vector engines in Arm Neoverse platforms, and its applicability to open-source Deep-packet-inspection applications like Snort.

Background

With the shift of Enterprise computing from on premises to the cloud, it has become critical to ensure security of data transfers and to detect possible attacks before they happen. To address this concern, traditional network security appliances and VPN gateways have transformed into Unified Threat Management (UTM) systems.  These systems analyze streams of data and its usage by devices to detect patterns and anomalies. Inspecting every byte of packet payload (‘deep-packet’) at typical network speeds can be very compute intensive, and this is where CPU SIMD vector engines can help by analyzing large sets of data in parallel.

Intel developed an open-source regular expression (regex) parsing library called Hyperscan that leveraged its SSE and AVX vector-engines and integrated it with a popular Deep-packet-inspection application, Snort. More info here: https://www.usenix.org/system/files/nsdi19-wang-xiang.pdf.

Vectorscan

https://github.com/VectorCamp/vectorscan

To create a regex parsing library that was optimized for Arm platforms, Arm collaborated with VectorCamp, who specialize in Software Optimizations & SIMD vectorizations across a range of popular CPU architectures. Together, we created an architecture-inclusive fork of Hyperscan called Vectorscan, that preserves the support for x86 and modifies the framework to allow for other architectures and vector engine implementations.

The goal for Vectorscan was to preserve the API compatibility with Hyperscan, to allow usage as a drop-in replacement in applications like Snort. Changes made to split the SIMD code into a separate library allowed not only portability across multiple architectures, but also reduced code-size by 1/3rd in relevant SIMD routines.

Vectorscan currently supports the Neon vector engine and Arm's focus is to continue further optimizations on Arm Neoverse platforms, including newer vector-engine implementations like Scalable Vector Engine (SVE/SVE2). Compatibility with the Hyperscan project is preserved - algorithms and Intel architecture optimizations were cherry-picked and integrated into Vectorscan to allow Linux distros to bundle and maintain one package instead of two.

At the same time, generic contributions like cache-prefetching, grouping load/stores and algorithm optimizations provided performance gains across all architectures. Cleanup and restructuring also makes it easier to debug and profile.

Performance results

On the Arm architecture, Vectorscan provides a performance uplift of 20-40% over the default regex implementations within SNORT. The below chart shows a single-core comparison of Vectorscan vs. default regex implementations in Snort on a Neoverse N1-based Ampere® Altra® CPU. This uses the Arm Neon vector-engines within N1. Future vector-engines implementations like Arm SVE and SVE2 in future will provide even further uplift.

Vectorscan vs. default regex implementation performance on Snort

Figure 1: Performance of Vectorscan versus default Regex Implementations in Snort (in Mbps)

We also compared Vectorscan performance on Arm and alternative systems with:

  • Ampere Altra with 80 Arm Neoverse N1 cores @ 3.0 Ghz
  • Intel Cascade-Lake Xeon 8268 Platinum with 24 cores @ 2.9 Ghz

The Ampere Altra compares well in both single and multi-core performance.

Vectorscan performance scales linearly - all the way up to 80 cores on Ampere Altra, providing an overall socket performance well above 20 Gbps. This allows users to flexibly allocate cores between deep-packet-inspection and other security packet-processing tasks like network-proxy, VPN, IDS/IPS etc.  By comparison, with only 24 cores available, the x86 system throughput maxes out at about 10 Gbps.

Ampere Altra vs. x86 Scaling ComparisonFigure 2: Ampere Altra vs. x86 Throughput Scaling Comparison

However, a typical UTM appliance must perform other packet-processing tasks that include Firewall, NAPT, tunneling, encryption, which take up a significant amount of CPU bandwidth. The Ampere Altra utilizes only 30% (24 out of 80) of its to achieve 10 Gbps of Vectorscan performance, as compared to close to 100% an Intel Xeon 8268 (24 out of 24).

System CPU Utilization for 10Gpbs Vectorscan

Figure 3: System CPU Utilization for 10Gbps Vectorscan

Conclusion

The work done by Arm and partners with Vectorscan provides network security application developers with a regex library that is portable across multiple architectures, optimized to leverage SIMD acceleration and preserves legacy compatibility. This enables Arm Neoverse based platforms to provide deep-packet-inspection performance that is not only comparable with alternatives at a per-core level, but at significantly lower system-level CPU utilization, and leaves headroom for other packet-processing tasks in the same system.

Learn more about Neoverse N1

Appendix

System configuration

  1. Arm - Ampere Altra: https://amperecomputing.com/wp-content/uploads/2021/06/Altra_Rev_A1_DS_v1.10_20210612-1.pdf
    1. 80 Neoverse N1 cores @ 3.0 Ghz
  2. X86 - Intel Xeon 8268 Platinum: https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html
    1. 24 cores / 48 threads @ 2.9 Ghz, HT = off
  3. Software/test configuration
    1. Snort Version = 3.1.1.0
    2. Vectorscan 5.3.0
    3. Linux Version = 5.10.9
    4. PCAP =maccdc2012_00001.pcap from  https://www.netresec.com/?page=MACCDC
Anonymous
Servers and Cloud Computing blog
  • Migrating our GenAI pipeline to AWS Graviton powered by Arm Neoverse: A 40% cost reduction story

    Hrudu Shibu
    Hrudu Shibu
    This blog post explains how Esankethik.com, an IT and AI solutions company, successfully migrated its internal GenAI pipeline to AWS Graviton Arm64.
    • August 28, 2025
  • Using GitHub Arm-hosted runners to install Arm Performance Libraries

    Waheed Brown
    Waheed Brown
    In this blog post, learn how Windows developers can set up and use Arm-hosted Windows runners in GitHub Action.
    • August 21, 2025
  • Distributed Generative AI Inference on Arm

    Waheed Brown
    Waheed Brown
    As generative AI becomes more efficient, large language models (LLMs) are likewise shrinking in size. This creates new opportunities to run LLMs on more efficient hardware, on cloud machines doing AI inference…
    • August 18, 2025