Embedded devices have rapidly evolved in the past decade. Although these devices are capable of running advanced web and desktop applications with sufficient performance, further improving the software is a never ending task for developers. Recent ARM CPUs include a NEONTM engine with an instruction set used to efficiently process multimedia content. However, since automatic compiler optimizations are limited for Single Instruction Multiple Data (SIMD) instruction sets today, developers need to optimize their software to this instruction set by hand.
The NEON instruction set provides parallel computation without threads, but also requires a new way of thinking for developers, which takes time to master. The most efficient way to learn is observing and modifying the behaviour of small code snippets. However, running and debugging these code snippets requires a full development environment, which might not be available anywhere. Investigating new ideas or demonstrating these ideas to others might be difficult when the environment is not available. The solution to address this limitation is the NEVADA simulator that we developed at the University of Szeged which is capable of running NEON code snippets inside any web browser, and offers debugger-like features which allows close inspection of the executed code.
NEVADA models a simplified ARM CPU with NEON and ARM registers, and a linear memory space is available for loading and storing further data. The whole machine state can be saved and restored later, which is useful for sharing code snippets. The state can also be saved as a link, so the saved code snippets can be embedded inside any webpage. The user interface of NEVADA resembles a simplified debugger, which allows modifying any register or memory content, and provides various execution modes like step-by-step execution. These features make NEVADA an excellent tool for learning and practicing coding in NEON.
In the followings we demonstrate the capabilities of the NEVADA simulator for running NEON code. First, let's choose a NEON snippet like this one, which swaps the red and blue channels. First, the simulator is needed to be initialized with some NEON code and data as seen below or just click here to load this program
The initialization itself is simple; the instructions are copied into the Code box, and some data to the Working Memory box. These changes are marked as red boxes in the picture:
Our instruction sequence consists of 3 instructions. Let's execute them one-by-one. After the first step, the data from the memory is loaded into the NEON registers which are turned to green:
The next instruction swaps d0 and d2 NEON registers:
And the last one writes back the data to a different location pointed by r1 ARM register.
Executing a code snippet is as simple as shown above. If you are interested in seeing other examples running in NEVADA simulator you can take a look into the three built-in examples on the webpage http://szeged.github.com/nevada/
Currently NEVADA aims to fully support the rich NEON instruction set, but later, depending on the success of the simulator, other instructions might be added. We hope all of you will enjoy using NEVADA for learning and working with NEON. All feedback is warmly welcome!
Zoltan Herczeg, Senior Developer - University of Szeged, is a Senior Developer at the Software Engineering Department in the University of Szeged, Hungary. He is an accepted contributor of several open source projects including the WebKit browser engine (reviewer status), Perl Compatible Regular Expressions (PCRE) library (commiter status) and maintainer of XEEMU, a cycle accurate ARM instruction simulator. He holds an MSc in Computer Science.
Other posts from Zoltan Herczeg:
Using ARM NEON to accelerate Scalable Vector Graphics in webkit by up to 4x