1.1 DirectFB Introduction
1.2 NEON Introduction
There is a good article Coding for NEON posted by Martyn, and you can find additional information from ARM infocenter as well.
2. Optimizing DirecFB with NEON
2.1 Profiling
Take the case of the fill-rectangle-blend(rgb16) operation, for example. It's easy to pinpoint the top functions by time spent using DS-5 Streamline. As you can see in the screenshot below, these are Sop_rgb16_to_Dacc (43.82%), Xacc_blend_invsrcalpha (16.29%), Sacc_to_Aop_rgb16 (11.09%), SCacc_add_to_Dacc_C (9.81%).
2.2 Optimization
DSBLIT_BLEND_ALPHACHANNEL] = Dacc_modulate_rgb_NEON;
"vld1.16 {q0}, [%[S]]! \n\t" /* Load 8 pixels from Source to q0 */
1. Data will be stored in structure GenefxAccumulator defined in DirectFB. There are 16 bits for each channel.
typedef union {
3. For R, G and B channels, the color data is shifted into the most significant bits, then shifted right with insert to position each color channel in the result register.
2.3 Debug
3. Benchmarking the Results 3.1 Environment
3.2 Result
4. Conclusion