We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
// Compare the 16-byte maximum with the global maximumvcgt.u8 d3, d0, d2 // d3[:] = (d0[:] > d2[:]) ?0xff :0x00// Update the global maximum if the 16-byte maximum is biggervbit d2, d0, d3 // d2[:] = (d3[:] == 0xff) ?d0[:] :d2[:]
I'm a bit tight on time at the moment to provide a timing analysis, though it would make for a nice blog post or a good exercise for the reader. ;-) I'll post back here when I've got further results to share.You mentioned some "other approaches"; if you've got any references I might also be able to compare to those.jpap
@ q0: index_replace_mask@ q1: data@ q2: data_max@ q3: indexes@ q4: c_0x01@ q5: indexes_max@ r0: byte_data@ r1: count0:vcgt.u8 q0, q1, q2pld [ r0, #256 ]vmax.u8 q2, q1, q2vld1.u8 { q1 }, [ r0 ]vbit.u8 q5, q3, q0subs r1, r1, #1vadd.u8 q3, q3, q4bne 0b