Hi guys,
I am currently using Ne10 library to perform signal processing task on Android device for better performance. I want to do a xcorr on 2 equal size array (say N) and want the output to be 2*N -1 in size. Kind of like in matlab y = xcorr(x1,x2) where x1 and x2 are equal size and y is twice the size of x1 - 1.
I have the same code using IPP like
ippsCrossCorr_32f(x1,N,x2,N,y1,2*N-1,-(N-1));
Using Ne10, I cant seem to have a work around to get y as an output of 2*N - 1 unless I zero pad both x1 and x2 to get the same output as the IPP function call.
What I am doing in Ne10 is as such:
coeff = zero pad x1 in front
input = zero pad x2 at the back
ne10_fir_float_neon(state,input,coeff,output,2*N-1);
The output from both IPP function and Ne10 function are matching but I want a solution that does away with the zero padding. Is there any function call interpretation that I made mistake on?
Thanks,
Kelvin
Hi Matthew thanks for the reply. Yes performance is an issue thats why I try to do away with the paddings. Because I am calling memcpy many times.
Hi
The usuage of Ne10 fir function is as follows:
void ne10_fir_float_c (const ne10_fir_instance_f32_t * S,
ne10_float32_t * pSrc,
ne10_float32_t * pDst,
ne10_uint32_t blockSize).
The NEON version has the same interface.
Before using it, you need to call initialization function:
ne10_result_t ne10_fir_init_float (ne10_fir_instance_f32_t * S,
ne10_uint16_t numTaps,
ne10_float32_t * pCoeffs,
ne10_float32_t * pState,
ne10_uint32_t blockSize)
{
/* Assign filter taps */
S->numTaps = numTaps;
/* Assign coefficient pointer */
S->pCoeffs = pCoeffs;
/* Clear state buffer and the size of state buffer is (blockSize + numTaps - 1) */
memset (pState, 0, (numTaps + (blockSize - 1u)) * sizeof (ne10_float32_t));
/* Assign state pointer */
S->pState = pState;
return NE10_OK;
}
Currently, the input and output have the same length. If you want to acheive your result, padding is one of resolutions. But padding needs too many memcpy. Another resolution is that you need to modify the neon code according to your requirement.
Hi Zhang Yang,
Thanks for the reply. I tried the first method zero padding with memcpy and the outcome was disastrous. I am more inclined to use the second method or use fft and ifft to replace the filter function. If I were to start on the second method, may I know how should I start? Do I need to change the ASM code?
Hi Kelvin
If you use fft and ifft to replace the filter function. you need to check the following files
https://github.com/projectNe10/Ne10/blob/master/modules/dsp/NE10_fft_float32.c
https://github.com/projectNe10/Ne10/blob/master/modules/dsp/NE10_fft_float32.neon.s
https://github.com/projectNe10/Ne10/blob/master/modules/dsp/NE10_fft_float32.neon.c
If you want to modify the code of FIR, you need to check
https://github.com/projectNe10/Ne10/blob/master/modules/dsp/NE10_fir.neon.s
Line 69: ne10_fir_float_neon
If you are not familar with NEON, you can take the doc "neon programmer’s guide" as reference.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html
Regards
Yang