We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
I have used some 32-bit microprocessor cores (non-ARM), which has a long word-length accumulator for some DSP operations, to avoid over-flow etc. After I check A8 core document, it is a surprise that I do not see any about this specification. It looks like it is a 32-bit as the register. For a FIR filter, 24-bit data/16-bit coefficients, at least 48-bit is needed for the accumulator. How to get satisfying results with A8 core?
Thanks,
Hello,
how about using NEON?
Best regards,
Yasuhiko Koumoto.
For arbitrary arithmetic, you might have to use the carry bit and instructions like ADC ("Add With Carry") to handle wide data types. However, the core instruction set has an SMLAL ("Signed Multiply Accumulate Long") instruction which probably does exactly what you want; it multiplies two 32-bit values and accumulates the 64-bit result with a value stored in two registers. There's also an unsigned variant (UMLAL).
That said, NEON is probably a good choice here. The VMLAL instruction, for example, performs the same multiply-accumulate operation. For a 64-bit accumulator it can handle two elements per instruction.
Yes that's exactly right. As to coding you could consider a package from some supplier and they might have managed to optimize for the memory access as well. On the own coding side there's three levels of increasing speed and difficulty you might consider - C code using int for the operands and long long int for the 64 bit accumulator - this would use SMLAL, using C with the ARM 'NEON intrinsics' extensions, and straight assembler using NEON both of which would use VMLAL.