ARM A76 (RK3588S)Stp Instruction take a long time

The time it took me to test each instruction with MegPeak is as follows。stp instruction time differs greatly from the official documentation(Arm_Cortex-A76_Software_Optimization_Guide)
bandwidth: 19.067337 Gbps
ldd throughput: 0.221717 ns 3547478.000000 runs 16000000
ldq throughput: 0.221717 ns 3547478.000000 runs 16000000
stq throughput: 1.327807 ns 21244908.000000 runs 16000000
ldpq throughput: 0.442687 ns 5666398.000000 runs 12800000
lddx2 throughput: 0.442888 ns 7086206.000000 runs 16000000
ld1q throughput: 0.221316 ns 3541061.000000 runs 16000000
eor throughput: 0.221280 ns 3540478.000000 runs 16000000
fmla throughput: 0.221590 ns 3545436.000000 runs 16000000
fmlad throughput: 0.221480 ns 3543687.000000 runs 16000000
fmla_x2 throughput: 0.475682 ns 7610905.000000 runs 16000000
mla throughput: 0.884828 ns 14157246.000000 runs 16000000
fmul throughput: 0.221298 ns 3540769.000000 runs 16000000
mul throughput: 0.884700 ns 14155203.000000 runs 16000000
addp throughput: 0.221262 ns 3540187.000000 runs 16000000
sadalp throughput: 0.442833 ns 7085331.000000 runs 16000000
add throughput: 0.221262 ns 3540186.000000 runs 16000000
fadd throughput: 0.221590 ns 3545436.000000 runs 16000000
smull throughput: 0.442432 ns 7078915.000000 runs 16000000
smlal_4b throughput: 0.442724 ns 7083581.000000 runs 16000000
smlal_8b throughput: 0.442851 ns 7085622.000000 runs 16000000
dupd_lane_s8 throughput: 0.221280 ns 3540478.000000 runs 16000000
mlaq_lane_s16 throughput: 0.885192 ns 10622309.000000 runs 12000000
sshll throughput: 0.442706 ns 7083289.000000 runs 16000000
tbl throughput: 0.221262 ns 3540187.000000 runs 16000000
ins throughput: 0.442651 ns 7082415.000000 runs 16000000
sqrdmulh throughput: 0.884609 ns 14153745.000000 runs 16000000
usubl throughput: 0.221207 ns 3539311.000000 runs 16000000
abs throughput: 0.221553 ns 3544853.000000 runs 16000000
fcvtzs throughput: 0.885320 ns 14165121.000000 runs 16000000
scvtf throughput: 0.884828 ns 14157246.000000 runs 16000000
fcvtns throughput: 0.884810 ns 14156954.000000 runs 16000000
fcvtms throughput: 0.884773 ns 14156371.000000 runs 16000000
fcvtps throughput: 0.885265 ns 14164246.000000 runs 16000000
fcvtas throughput: 0.884427 ns 14150829.000000 runs 16000000
fcvtn throughput: 0.884554 ns 14152871.000000 runs 16000000
fcvtl throughput: 0.884974 ns 14159579.000000 runs 16000000
ins_ldd throughput: 0.442824 ns 5668148.000000 runs 12800000
ldq_fmlaq throughput: 0.232800 ns 3724808.000000 runs 16000000
ldd_fmlaq_sep throughput: 0.249211 ns 3189901.000000 runs 12800000
ldd_fmlaq_lane_sep throughput: 0.243519 ns 3896305.000000 runs 16000000
ldd_ldx_ins_fmlaq_lane_sep throughput: 0.364600 ns 4666874.000000 runs 12800000
ins_fmlaq_lane_1_4_sep throughput: 0.381871 ns 4887954.000000 runs 12800000
ldd_fmlaq_lane_1_4_sep throughput: 0.221891 ns 2840199.000000 runs 12800000
ins_fmlaq_lane_sep throughput: 1.089538 ns 17432604.000000 runs 16000000