Use the following code to profile memory bandwidth for Kirin 985:
// initialize
std::vector<float> vecSrc(1000000, 2);
std::vector<float> vecDst(1000000, 3);
// memcpy profile
auto tStart = std::chrono::high_resolution_clock::now();
memcpy(vecDst.data(), vecSrc.data(), vecSrc.size() * sizeof(float));
auto tEnd = std::chrono::high_resolution_clock::now();
// calculate time
std::cout << "vecDst[999999] = " << vecDst[999999] << std::endl;
float tDif = std::chrono::duration_cast<std::chrono::microseconds>(tEnd - tStart).count() / 1000.f;
std::cout << "tDif = " << tDif << "ms" << std::endl;
result is 1.302 ms.
measured bandwidth should be
1000000.0 * 4 / 1024 / 1024 / 1024 / (1.302 / 1000) = 2.86 GB/s
The max bandwidth should be much higher than this. Why is memcpy so low? Could anyone help? Thanks
Thanks. Yes it seems not to be a good benchmark test.
(BTW, float memcpy and int memcpy results are the same (which should be no surprise))