We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Although you still left one stall cycle there
This time, with a small image, 128*128 resolution, the time is shorten from 16.7ms to 11.3ms on my i.MX51.
But on my A9, the improvement is so tiny, just 1ms, from 20ms to 19ms.So I'm confused again.
Any one have document about Cortex-A9 pipeline ?
vld3.8 {d0-d2}, [r1]! @ cycles 0-3, result in N2 of last cycle vmull.u8 q3, d0, d5 @ cycle 4 (can't dual issue due to previous result in N2) vmlal.u8 q3, d1, d4 @ cycle 5 vmlal.u8 q3, d2, d3 @ cycle 6, result in N6 vshrn.u16 d6, q3, #8 @ cycle 12 (value needed in N1, 5 cycle stall), result in N3 vst1.8 {d6}, [r0]! @ cycle 15 (value needed in N1, 2 cycle stall) subs r2, r2, #1 @ overlaps w/NEON bne .loop @ overlaps w/NEON
From the very beginning, Idon't think AML8726-M is a good platform for its 128KB L2 and 65nm fabprocess, but its multimedia performance is pretty well, 1080P, Mali400. What is the differences between imx515 and imx535, freq?