This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM v8 Neon instruction for multiply long

 Hi All,

        I need perform multiply long operation on uint16x8_t data type on ARM v8.

 

The ARM v7 implementation would be as follows:  

uint16x8_t u16x8_data1 = vld1q_u16(pBuffer1);

uint16x8_t u16x8_data2 = vld1q_u16(pBuffer2);

uint32x4_t  u32x4_mul_result_low = vmull_u16(vget_low_u16(u16x8_data1),vget_low_u16(u16x8_data2));

uint32x4_t  u32x4_mul_result_high = vmull_u16(vget_high_u16(u16x8_data1),vget_high_u16(u16x8_data2));

 

In ARM v8 we have the instruction vmull_high_u16(), which directly operates on the last 4 elements on u16x8_data1 and u16x8_data2.

But there is no corresponding instruction for the first 4 elements(low).

 

i.e  uint32x4_t  u32x4_mul_result_high = vmull_high_u16(u16x8_data1,u16x8_data2).  So here can avoid vget instruction.

But there is no corresponding vmull_low_u16() instruction.

 

So my query is, How to perform the mull on the lower data without using vget instruction?