We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
-1 down vote favorite
From the ARMv8 instruction overview about tbl & tbx instructions, I found that 'tbl' is Vector table lookup instruction is used for rearranging data within vectors and 'tbx' (Vector table extend, and is variant of TBL) instructions.
I also got additional documentation on related neon instructions, https://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors
I am not getting clear idea on which kind c code will result in generation of such instructions. There is also 3 register and 4 register variant, so in which scenarios these should be generated by any compiler. Can anyone give details on getting better understanding ? Is there any sample C code which I can refer or any particular compiler I should use ?
I found an aes example (http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/252688.html) in linux source, which generates tbx instructions. But there is no source for this. This seems to be a very optimized assembly for ARM, as well. With this clue, I tried checking for various versions of aes but for none the tbx/tbl instructions are generated.
Thanks,
I tried using gcc6.1 and llvm3.9 for the SpecInt2006 benchmarks, where I see the 'tbl' instructions getting generated for bzip2 and gcc. For bzip2 benchmark, this instruction generation is for the 'subbytes function' whereas in gcc this is for 'subst_stack_regs function'. I think the compiler does a series of transformation to create vectors out of the data accesses.