This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Output image is wrong when using NEON intrinsics?

Hello everyone,

I'm currently converting some simple normal image processing functions to NEON functions for increase in performance. However, when I try with a simple one, the output image is not the same as the original image:

static void foo_neon( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
	int i,j;
	uint8x16_t vectA, vectB, vectC;
	for(i=0;i<=ys/2-1;i++)
	{
		for(j=0;j<=xs/2-1;j+=16)
		{
			vectA = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2)]), 1);
			vectB = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2+1)]), 1);
			vectC = vhaddq_u8(vectA, vectB);
			vst1q_u8(&dst[i*(xs/2)+j], vectC);
		}
	}
}

static void foo( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
	int i,j;

	for(i=ys/2-1;i>=0;i--)
	{
		for(j=xs/2-1;j>=0;j--)
			dst[i*(xs/2)+j] =(unsigned char)((src[(i*2)*xs+(j*2)]+ src[(i*2)*xs+(j*2+1)])/2);
	}
}

This code takes the input image buffer (RAW format), do some simple processings and output to another image buffer. The xs is width and ys is height of the image.

Is there something wrong with the conversion to NEON or not? From what I see, there is nothing wrong. The only thing I can doubt about is that there may be some saturation when I add and then halve the result in NEON. However, I really need your help to clarify it for me.

Thank you.

Parents
  • I load the data to NEON as uint8_t, then I move those data from uint8_t to uint16_t register (to avoid overflow). Then I perform vhaddq(). Finally, I move back from uint16_t to uint8_t register and vst1_u8(). However, the result is still wrong. It doesn't seem to be a problem of overflow or wrong alignment.

Reply
  • I load the data to NEON as uint8_t, then I move those data from uint8_t to uint16_t register (to avoid overflow). Then I perform vhaddq(). Finally, I move back from uint16_t to uint8_t register and vst1_u8(). However, the result is still wrong. It doesn't seem to be a problem of overflow or wrong alignment.

Children