This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

NEON matrix multiply

Note: This was originally posted on 24th November 2011 at http://forums.arm.com

I'm new to neon and I'm trying to do some multiplication.
I need to do some multiplication of two arrays and I'm trying to learn some NEON assembly.

I have 2 arrays of int16_t elements. Each array has 4 elements (a[0]-a[3] and b[0]-b[3])
I need to produce resulting array c with 4 int16_t values as:

c[0] = a[0] * b[0]
c[1] = a[0] * b[1] + a[1] * b[1]
c[2] = a[0] * b[2] + a[1] * b[2] + a[2] * b[2]
c[3] = a[0] * b[3] + a[1] * b[3] + a[2] * b[3] + a[3] * b[3]


I'm sure that something like that should be trivial in NEON but I have no idea how to get it working.

My approach is like this:

vmov.32   d0, #0  // (destination array c)
//load arrays a and b into d1 and d2:
vld1.16  d1, [r0]
vld1.16  d2, [r1]

vmla.s16 d0, d1, d2[0]   // 1st column
// ? TODO... rotate
vmla.s16 d0, d1, d2[1]   // 2nd column
vmla.s16 d0, d1, d2[2]
vmla.s16 d0, d1, d2[3]



Basically, at the place of my TODO I want to shift elements of array b so that b becomes:  {b[0], b[1], b[2], b[3]} -> {0, b[0], b[1], b[2]}


Is my approach correct, or I cannot do so in arm-neon?
PS. I tried to use intrinsics with evaluation version of RVDS and it seems that it doesn't work: the generated asm is empty and doesn't have these instructions at all!


Thanks.
0