memcpy slowness on Kirin 985

Use the following code to profile memory bandwidth for Kirin 985:

// initialize

   std::vector<float> vecSrc(1000000, 2);

   std::vector<float> vecDst(1000000, 3);

   // memcpy profile

   auto tStart = std::chrono::high_resolution_clock::now();

   memcpy(vecDst.data(), vecSrc.data(), vecSrc.size() * sizeof(float));

   auto tEnd = std::chrono::high_resolution_clock::now();

   // calculate time

   std::cout << "vecDst[999999] = " << vecDst[999999] << std::endl;

   float tDif = std::chrono::duration_cast<std::chrono::microseconds>(tEnd - tStart).count() / 1000.f;

   std::cout << "tDif = " << tDif << "ms" << std::endl;

result is 1.302 ms.

measured bandwidth should be

1000000.0 * 4 / 1024 / 1024 / 1024 / (1.302 / 1000) = 2.86 GB/s

The max bandwidth should be much higher than this. Why is memcpy so low? Could anyone help? Thanks

More questions in this forum