Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Windows RT App Optimization with NEON
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Windows RT App Optimization with NEON

Alan Chuang
Alan Chuang
September 11, 2013
3 minute read time.

With the arrival of Windows RT and the opening of Microsoft App Store, you can now develop Windows Store Apps on ARM platforms and make them available to 200+ markets via the Microsoft App Store. If you are an Android or ARM Linux developer, you probably are already using ARM® NEON™ to optimize your applications or benefiting from NEON-optimized libraries. In fact, you can use NEON to speed up your Windows RT applications as well.

NEON is a wide SIMD data processing architecture extension introduced in ARMv7 architecture. It performs "Packed SIMD" processing and can be used to optimize multimedia codec algorithms, 2D/3D graphic libraries or other data processing applications. The use of NEON has proven to be very popular in many open-source projects or proprietary applications. The WebM Multimedia project and Android's Skia library are good examples of software libraries utilizing NEON instructions.

Windows RT also utilizes NEON for optimization. The Microsoft Visual C++ compiler supports NEON intrinsics with implementation close to ARM RCVT compiler 4.1. You have access to NEON intrinsics by including the arm_neon.h header file. This is the same as what you would do for Linux/Android development. Refer to MSDN for more details. The SIMD C++ Math library (DirectXMath.h) is implemented using NEON intrinsics and can be used as a good reference.

As an example, I decided to port the HelloNEON program from Android NDK to see how easy it is to use NEON intrinsics on Windows RT. The HelloNEON program offers several benefits. It is small and nicely written; so it is easy to understand and modify if needed. It also offers both C and NEON implementations; so I can easily show the benefit of NEON optimization.

As it turns out, there is really not much work needed. All I have to do is to create a WinRT component project that contains the bulk of function implementations,  replace the Linux system call for getting the timestamp with the Windows version, wrap the main routines as WinRT component and finally implement a simple JavaScript-based Windows Store app as front-end for initiating the tests.

Rewriting the timestamp function:

Android Version:


Windows Version:


 
Wrap the main routines as WinRT component:


 
Implement a simple UI with JavaScript/HTML5:

  Once the coding is done, you have to specify the platform to be 'ARM' and the build configuration to be 'Release'. You also have to set up remote debugging for running the program on your Windows RT device. In my case, I tested it on my Surface RT tablet.

  The result is great -- Normalizing the result, the NEON version is about twice as fast as the C version.

So, without any hardware change, I am able to get 100% improvement with NEON optimization over the original C implementation. Obviously, the result will vary depending upon your functions or algorithms, but the benefit is obvious. It is also worth noting that this is a single-thread implementation. The Surface RT device uses NVIDIA® Tegra® T30 chip, which utilizes a quad ARM Cortex™-A9 MPCore CPU. If your function or algorithm can be fairly paralleled into independent processing blocks, a multi-thread implementation will give you even further optimization.

With NEON intrinsics support in Microsoft Visual C++ compiler, using NEON to speed up your Windows RT application is as easy as including the relevant header file and compiler options. With so many applications benefiting from NEON optimization, your application should too. For more information on NEON, check out the ARM online infocenter. You can also find the online NEON programming reference guide as well.

Anonymous
  • John Scott
    John Scott over 12 years ago
    Is it true that the ARM instruction set is limited when coding for Windows RT?   I've heard that you can only use Thumb-2 instructions, is that the case?  Where is this information documented please?  Thanks!
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Alan Chuang
    Alan Chuang over 12 years ago


    Questions specific to Windows technology are better directed to Microsoft. For more information on Thumb-2 technology can be found on ARM infocenter website ([size=3][url="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471i/CHDFEDDB.html"]http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471i/CHDFEDDB.html[/url]).[/size]

    Is it true that the ARM instruction set is limited when coding for Windows RT?   I've heard that you can only use Thumb-2 instructions, is that the case?  Where is this information documented please?  Thanks!
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Rod Crawford
    Rod Crawford over 12 years ago
    A couple of other NEON resources that may be useful too are:
    [list][*]Nevada- a tool for visualizing NEON code execution described here: [url="http://bit.ly/ARMnevada"]http://bit.ly/ARMnevada[/url][*]Writing NEON code Chapter of the Cortex-A Programmers guide here: [url="http://bit.ly/Cortex-A-guide"]http://bit.ly/Cortex-A-guide[/url][/list]
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Architectures and Processors blog
  • Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

    Chris Walsh
    Chris Walsh
    Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
    • October 3, 2025
  • Arm A-Profile Architecture developments 2025

    Martin Weidmann
    Martin Weidmann
    Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
    • October 2, 2025
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025