1 Introduction
ARM® NEON™ technology is a SIMD (single instruction multiple data) architecture extension for the ARM Cortex™-A series processors. It can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, and image processing. In the past three years, there have been many multimedia applications that have used NEON and are delivering a significantly enhanced user experience. Some application developers may be not familiar with NEON assembly coding so Ne10 library was created to let developers get the most out of ARMv7/NEON without arduous assembly coding.
The Ne10 library provides a set of the most commonly used functions that have been heavily optimized. It was first announced in March 2012. The initial set of functionality in the library focuses on matrix/vector algebra and signal processing. Ne10 will evolve over time to encompass more of the compute heavy tasks in a variety of domains such as image processing.
This article will introduce how to compile and use Ne10 library
2 Ne10 Overview
When you checkout Ne10 source code from https://github.com/projectNe10/Ne10, you will notice a number of directories. The following figure illustrates the use of each directory.
3 Environment
First, let’s prepare the whole development environment.
3.1 Hardware environment
You need to prepare an ARM Cortex-A series development platform. If no hardware development platform, you can also use emulated environments like Google’s Android Emulator. I’m using the Panda Board (http://pandaboard.org/) with Ubuntu 11.10.
Alternatively you can use a traditional desktop environment for cross compiling:
3.2 Software environment
For the desktop environment you will also need the following tools:
4 Compiling and using Ne10 library
Now, we can start to download Ne10 source code and compile it.
4.1 Compiling Ne10
Ne10 uses CMake to implement the whole build system. The benefit of using CMake is that we could implement cross-platform easily.
1) Native compiling (compiling on an ARM platform).
For UNIX platforms, use the following commands in a terminal: (Replace $NE10PATH with the directory where the source code is located)
$cd $NE10PATH $mkdir build $cd build $cmake .. $make
libNE10.a is placed in $NE10PATH /build/modules/ and a test program "NE10_test_static" is placed in $NE10PATH /build/samples/. You can run it. Consider adding -
DNE10_BUILD_SHARED=ON to the cmake call to generate the dynamic library and test program "NE10_test_dynamic".
2) Cross compiling (compiling on a non-ARM platform for ARM powered devices)
The process of cross-compiling is similar with native compiling. You just need to configure the correct toolchain by creating the config.cmake and placing this file in $NE10PATH/.
set( CMAKE_C_COMPILER arm-linux-gnueabi-gcc ) set( CMAKE_CXX_COMPILER arm-linux-gnueabi-g++ ) set( CMAKE_ASM_COMPILER arm-linux-gnueabi-as ) find_program(CMAKE_AR NAMES "arm-linux-gnueabi-ar") mark_as_advanced(CMAKE_AR) find_program(CMAKE_RANLIB NAMES "arm-linux-gnueabi-ranlib")mark_as_advanced(CMAKE_RANLIB)
Then you can use the following commands to compile.
$mkdir build $cd build$cmake -DCMAKE_TOOLCHAIN_FILE=../config.cmake .. $make
The Ne10 library and test sample are placed in the same directory as native compiling above. You can copy these to the target and run them.
Note:
When you run NE10_test_dynamic on the target, you might receive the error: "NE10_test_dynamic: error while loading shared libraries: libNE10_shared.so.10: cannot open shared object file: No such file or directory"
You can run the following command:
$export LD_LIBRARY_PATH=$NE10PATH/build/modules
4.2 Using Ne10
After the process above, Ne10 library is ready. I will introduce how to use Ne10 library by a sample.
1) Source code
You can call Ne10 functions directly as following.
#include <stdio.h> #include <stdlib.h> #include "NE10.h" main(void) { ne10_int32_t i; ne10_float32_t thesrc[5]; ne10_float32_t thecst; ne10_float32_t thedst1[5]; ne10_float32_t thedst2[5]; for (i=0; i<5; i++) { thesrc[i] = (ne10_float32_t) rand()/RAND_MAX*5.0f; } thecst = (ne10_float32_t) rand()/RAND_MAX*5.0f; ne10_addc_float_c( thedst1 , thesrc, thecst, 5 ); ne10_addc_float_neon( thedst2 , thesrc, thecst, 5 ); printf("==========end=========\n"); }
Ne10 also provides the feature of auto detecting NEON hardware. After initialization, the function pointer will point the correct version (C or NEON).
ne10_init( ); ne10_addc_float( thedst , thesrc, thecst, 5 );
2) Compiling the program
Replace $NE10_INC_PATH and $NE10_LIB_PATH with the directories where these files are located
$gcc –O2 -o sample sample.c -I$NE10_INC_PATH -l:$NE10_LIB_PATH/libNE10.a
$gcc –O2 -o sample sample.c -I$NE10_INC_PATH -l:$NE10_LIB_PATH/libNE10.so -lm
Note: When you use dynamic library, and you don't add option "-lm", there will be error "undefined reference to `sqrtf'".
Then you can run this sample.
5 Conclusion
Ne10 is useful library for applications developers. You can get the most out of NEON without arduous assembly coding. I hope this article could help you know how to use Ne10 to accelerate your applications. If you want to learn more about Ne10, please access http://projectne10.github.com/Ne10/
Yang Zhang, Home Software engineer - Home Software Enabling team, ARM, Yang has several years of experience working on projects related to video codec, including H.264/AVC, H.263, MPEG4, MPEG2, VC-1 and AVS. She has a deep understanding of video codec algorithm. Being Home Software Engineer , she specializes in the digital multimedia system for ARM Home. Yang graduated from Zhejiang University with the degree of Master. She is currently based in Shanghai, China.
复数矩阵运算有相关的函数库吗?
when the math module is optimized for aarch64?
I'm considering usage of Ne10 for new project - need basic DSP - filtering, correlation, fft. I'm wondering what are benefits using Ne10 comparing to FFTW + iteration loops + autovectorization?