Compilers and Libraries forum Why Neon LD4 insturction is resolved to 2 ldp instructions?

State Not Answered
Locked Locked
Replies 1 reply
Subscribers 18 subscribers
Views 1479 views
Users 0 members are here

Options

Related

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why Neon LD4 insturction is resolved to 2 ldp instructions?

shanshan over 2 years ago

I notice that my intrinsic code:

LD4 {v0.16B, v1.16B, v2.16B, v3.16B}, [x1], #64

is resolved to:

ldp q0, q1, [x1], #32
ldp q2, q3, [x1], #32

It's quite confusing:

1. why LD4 is resolved to two ldp? is this some compiler optimization? but I think 1 ld4 is faster than 2 ldp?

2. why v registers are resolved to q registers? I think q registers are only used in AArch32, and this is AArch64.

I also tried inline assembly:

ld4 {v8.2d, v9.2d, v10.2d, v11.2d}, [" src_r "], #64

this is resolved as expected:

ld4 {v8.2d, v9.2d, v10.2d, v11.2d}, [x3], #64

Parents

0 Ronan Synnott over 2 years ago

Hi Shanshan

This does not seem to be correct, the LD4 instruction would interleave data, which the LDP instruction would not do.

https://developer.arm.com/documentation/102159/0400/Load-and-store---data-structures

To properly understand, can you provide a full code example, as well as the build options and compiler version used.

You may be best served to raise an official support case with Arm from the support menu above, so that this can be properly analysed.
Cancel
Up 0 Down

Cancel

Reply

0 Ronan Synnott over 2 years ago

Hi Shanshan

This does not seem to be correct, the LD4 instruction would interleave data, which the LDP instruction would not do.

https://developer.arm.com/documentation/102159/0400/Load-and-store---data-structures

To properly understand, can you provide a full code example, as well as the build options and compiler version used.

You may be best served to raise an official support case with Arm from the support menu above, so that this can be properly analysed.
Cancel
Up 0 Down

Cancel

Children

No data