Support forums

Mobile, Graphics, and Gaming forum Output image is wrong when using NEON intrinsics?

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 9 replies
Subscribers 139 subscribers
Views 8319 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Output image is wrong when using NEON intrinsics?

thanhvu94 over 8 years ago

Hello everyone,

I'm currently converting some simple normal image processing functions to NEON functions for increase in performance. However, when I try with a simple one, the output image is not the same as the original image:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static void foo_neon( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
    int i,j;
    uint8x16_t vectA, vectB, vectC;
    for(i=0;i<=ys/2-1;i++)
    {
        for(j=0;j<=xs/2-1;j+=16)
        {
            vectA = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2)]), 1);
            vectB = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2+1)]), 1);
            vectC = vhaddq_u8(vectA, vectB);
            vst1q_u8(&dst[i*(xs/2)+j], vectC);
        }
    }
}
static void foo( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
    int i,j;
    for(i=ys/2-1;i>=0;i--)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

static void foo_neon( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
	int i,j;
	uint8x16_t vectA, vectB, vectC;
	for(i=0;i<=ys/2-1;i++)
	{
		for(j=0;j<=xs/2-1;j+=16)
		{
			vectA = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2)]), 1);
			vectB = vshrq_n_u8(vld1q_u8(&src[(i*2)*xs+(j*2+1)]), 1);
			vectC = vhaddq_u8(vectA, vectB);
			vst1q_u8(&dst[i*(xs/2)+j], vectC);
		}
	}
}

static void foo( unsigned char* dst, const unsigned char* src, int xs, int ys)
{
	int i,j;

	for(i=ys/2-1;i>=0;i--)
	{
		for(j=xs/2-1;j>=0;j--)
			dst[i*(xs/2)+j] =(unsigned char)((src[(i*2)*xs+(j*2)]+ src[(i*2)*xs+(j*2+1)])/2);
	}
}

This code takes the input image buffer (RAW format), do some simple processings and output to another image buffer. The xs is width and ys is height of the image.

Is there something wrong with the conversion to NEON or not? From what I see, there is nothing wrong. The only thing I can doubt about is that there may be some saturation when I add and then halve the result in NEON. However, I really need your help to clarify it for me.

Thank you.

Top replies

thanhvu94 over 8 years ago in reply to thanhvu94 +2 verified

Ah problem solved. It turns out that the bug is in src[ ... + (2*j)] . In NEON function, I load 8 adjacent elements instead of "take one ignore one".