In 2019, Arm Custom Instructions were announced. This is a new standard feature of the Armv8-M architecture, allowing developers to implement use case-specific workload acceleration, pushing performance and longevity of devices ready for the fifth wave of computing.
Unlike traditional approaches that implement memory mapped or coprocessor-based accelerators, these custom instructions are fully integrated to the Arm Architecture. They can make use of the processor registers and are handled through the standard instruction pipeline. This means there is no additional overhead for the use of these instructions.
Arm offers a complete range of tools for all stages of the development process:
Fast Models and Cycle Models give users the ability to implement virtual platforms for software development and performance analysis in lieu of real hardware being available.
The industry-leading Arm Compiler, which is integrated into both:
Keil Microcontroller Development Kit (MDK) and Arm Development Studio. Each toolchain offers its own IDE and debug environment, as well as performance analysis features, working with both virtual and physical targets. Note that a license for Development Studio will also enable MDK.
To illustrate the usage of these tools to develop around custom instructions, we have created this brief video.
Custom instructions will typically be used to implement application or algorithm-specific instructions, which will then be used in a device-specific version of that code. In the example shown in the video, a simple CRC routine is implemented both in standard C and as a custom instruction. Projects can be developed within MDK or Development Studio as appropriate. Both offer complete project management and debug capabilities (This blog highlights the key differences between Keil MDK and Development Studio).
MDK supports Event Recorder and Event Statistics, enabling the user to generate high-level trace and profiling information through simple annotation of your code. By using these features, we can clearly see the overall benefit of the use of a custom instruction, with >4x improvement in the execution time of the routine. In a real-world scenario, this would equate to more bandwidth for other functionality, or a quicker return to a low-power mode, extending the battery life of your product.
A similar analysis can be done within Development Studio. For example the trace output could be used to visualize the difference in execution of each implementation of the routine.
Deeper system level performance analysis can be done with the Streamline Performance Analyzer. We can again easily visualize the ~4x throughput. In the below, the peaks represent the custom instruction implementation, the troughs are the C version. Further analysis within the tool allows you to deep dive into behavior of the code. My colleague Zach writes about a real world use case example of Streamline here.
Arm Custom Instructions will transform future generations of high-performance, low-power devices. Arm development tools are available now to support your needs today and in the future. Refer to “Get Started with Early Development on the Arm Cortex-M55 Processor” for more information about the latest software development tools. If you are unfamiliar with MDK or Development Studio, they are available to evaluate free of charge. We also invite you to contact us to discuss your specific needs in more detail.
[CTAToken URL = "https://www.keil.com/demo/eval/arm.html" target="_blank" text="Evaluate Keil MDK" class ="green"]
Further support for the Arm Custom Instructions were added in subsequent tools releases.
ACLE instrinsic function support added to Arm Compiler 6.14.1 (and later)
https://developer.arm.com/tools-and-software/embedded/arm-compiler/downloads/version-6
Debug support is added in Development Studio 2020.0 (and later), as described here:
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/development-studio-2020-0
https://developer.arm.com/documentation/101471/2000/Arm-Debugger-commands/Arm-Debugger-commands-listed-in-alphabetical-order/set-cde-coprocessors