# NEON multiplying 8-bit vector with 16-bit scalars

Hi,

I'm experimenting with NEON on a i.mx7d SoC.

I'm trying to do the following calculation.

I've got 8 RGB pixels stored in a vector. uint8x8x3_t rgb

I then want to calculate:

rside=R*19*19

rsideext=(R+1)*19*19

gside=G*19

gsideext=(G+1)*19

bsideext=B+1

My R,G,B values are 8-bit.

Now, my scalar value of 19*19 to get rside is 16-bit.

So I need to find a way to do vmul 8-bit with a 16-bit scalar.

I'm seeing:

``uint16x8_t  vmulq_u16(uint16x8_t a, uint16x8_t b);`oruint16x4_t vmul_lane_u16 (uint16x4_t, uint16x4_t, const int) whereas I'd been hoping for:uint16x8_t something(uint8x8_t r, const int)I guess that kind of abstraction doesn't exist.So now I'm thinking:uint16x8_t v_rside;uint16x4_t v_rside_u;`
`uint16x4_t v_rside_l;`
`uint8x8_t side = vdup_n_u8 (19)v_rside = vmull_u8(rgb.val, side);v_rside_l = vmul_lane_u16(vget_low_u16(v_rside),  19);v_rside_u = vmul_lane_u16(vget_high_u16(v_rside), 19);v_rside = vcombine_u16(v_rside_l, v_rside_u);Is that the most efficient way to do it?Then I've got the rside_ext which I think I can get by adding 19*19.Except I don't see a vadd scalar.`

I see something like:

vmlal_lane_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)

But that expands it to uint32x4 which I don't need.

So I guess I would need to do:

except I'd need to make a uint16x8_t side = vdup_n_u16(19).

This is getting ugly now.

I figure I better ask if I'm thinking about all of this correctly or if there's a much cleaner way to do this.

Thanks!

More questions in this forum

266 views
2 replies
Latest

134 views
Latest

135 views
Latest

229 views