This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

st1为什么如此耗时?

我的代码模式是:

while(4096){

"ld1    { v8.4s,  v9.4s,v10.4s, v11.4s}, [%2], #64 \n" // output
"ld1 {v12.4s, v13.4s,v14.4s, v15.4s}, [%2], #64 \n" // output
"sub    %1, %1, #128                  \n"
"ld1    {v0.4s, v1.4s, v2.4s, v3.4s}, [%1], #16  \n"


计算部分;

 "st1    {v12.4s, v13.4s,v14.4s, v15.4s}, [%2], #64 \n" // output
"st1 { v8.4s, v9.4s, v10.4s, v11.4s},[%2], #64 \n" // output

}

如图所示,为什么存储st1部分所占耗时的比例将近42%?