This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strange behaviour of uint8x8x2_t

Hi @ all,

I have an uint32x4_t on which I want to perform a count of leading zeros and a table lookup with intrinsics. The table lookup should be performed like this:

Fullscreen
1
2
3
4
0 1 2 3 4 5 6 7 8 9 A B C D E F
Data ||0x0,0x0,0x1,0x2|0x0,0x3,0x0,0x4||0x5,0x6, 0x7, 0x8| 0x0, 0x0, 0x0, 0x9||
SMask ||0x2,0x3,0x5,0x6|0x7,0x8,0x9,0xA||0xB,0xF,0x10,0x10|0x10,0x10,0x10,0x10||
Result ||0x1,0x2,0x3,0x0|0x4,0x5,0x6,0x7||0x8,0x9, 0x0, 0x0| 0x0, 0x0, 0x0, 0x0||
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Thus table lookup is only supported through uint8x8xN and the count of leading zeros is only possible in the way I want it with uint32x4_t I ran into the following problem:

This is my code so far:

 

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <arm_neon.h>
inline uint8x16_t Shuffle(const uint8x16_t & src, const uint8x16_t & shuffle) {
return vcombine_u8(
vtbl2_u8(
(const uint8x8x2_t &)src,
vget_low_u8(shuffle)
),
vtbl2_u8(
(const uint8x8x2_t &)src,
vget_high_u8(shuffle)
)
);
}
int main() {
//FIX PART ONLY RUN ONCE
//lookupTableIdx = 64*a + 16*b + 4*c + d
//shiftData contains the needed leftshifts to get the correct idx
int32_t* shiftData = new int32_t[4];
shiftData[0] = 6;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 

If I compile that using:

Fullscreen
1
g++ -march=native -mfpu=neon -std=c++14 main.cpp
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and run it, the output is this:

Fullscreen
1
0 0 0 3 0 8 7 6 5 0 0 0 0 0 0 0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

But it should be this:

Fullscreen
1
1 2 3 0 4 5 6 7 8 9 0 0 0 0 0 0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Does anyone know, what I am doing wrong?

 

Sincerely

0