Hi all,
The LDM and STM instruction is no more supported in ARMv8 and instead LTP and STP is used.
What is the key difference between the same why the instruction is loaded and stored in pair.
I think the differences are:
(1) LDP and STP can only have 2 destination registers.
(2) LDP and STP can have an signed offset + base as address. LDM and STM don't have offset.
(3) LDP and STP have pre-index and post-index. such as "ldp w3, w1, [x0, #4]!" (pre-index) and "ldp w3, w1, [x0], #4" (post-index).
LDM and STM usually need ADD instructions to get the right address. So I think may be the reason to use LDP and STP is that they are agiler.
I think LDM with 2 destination registers is slower than LDP (as it usually needs ADD instructions to get the address). And LDM with 4 destination registers may not faster than 2 LDPs (I'm not sure).
Hi Haoliu,
Thanks for the reply. Can you refer to the white paper document or arch manual with the exact information.
ARMv8 ISA overview :
The LDM, STM, PUSH and POP instructions do not exist in A64, however bulk transfers can be constructed using the LDP and STP instructions which load and store a pair of independent registers from consecutive memory locations, and which support unaligned addresses when accessing normal memory. The LDNP and STNP instructions additionally provide a “streaming” or ”non-temporal” hint that the data does not need to be retained in caches. The PRFM (prefetch memory) instructions also include hints for “streaming” or “non-temporal” accesses, and allow targeting of a prefetch to a specific cache level.
View all questions in Cortex-A / A-Profile forum