We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
"vmull.s16 q8, d8, d0 \r\n" //Col 0-3 "vmlal.s16 q8, d9, d1 \r\n" "vmlal.s16 q8, d10, d2 \r\n" "vmlal.s16 q8, d11, d3 \r\n" "vmull.s16 q12, d12, d4 \r\n" //Col 4-7 "vmlal.s16 q12, d13, d5 \r\n" "vmlal.s16 q12, d14, d6 \r\n" "vmlal.s16 q12, d15, d7 \r\n" "vadd.i32 q8, q8, q12 \r\n"
"vmull.s16 q8, d8, d0 \r\n" //Col 0-3 "vmull.s16 q12, d12, d4 \r\n" //Col 4-7 "vmlal.s16 q8, d9, d1 \r\n" "vmlal.s16 q12, d13, d5 \r\n" "vmlal.s16 q8, d10, d2 \r\n" "vmlal.s16 q12, d14, d6 \r\n" "vmlal.s16 q8, d11, d3 \r\n" "vmlal.s16 q12, d15, d7 \r\n" "vadd.i32 q8, q8, q12 \r\n"
For instance, if you can have a register that has 1 in all of the fields you can replace that vadd at the end with another vmlal.