We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Use the following code to profile memory bandwidth for Kirin 985:
// initialize
std::vector<float> vecSrc(1000000, 2);
std::vector<float> vecDst(1000000, 3);
// memcpy profile
auto tStart = std::chrono::high_resolution_clock::now();
memcpy(vecDst.data(), vecSrc.data(), vecSrc.size() * sizeof(float));
auto tEnd = std::chrono::high_resolution_clock::now();
// calculate time
std::cout << "vecDst[999999] = " << vecDst[999999] << std::endl;
float tDif = std::chrono::duration_cast<std::chrono::microseconds>(tEnd - tStart).count() / 1000.f;
std::cout << "tDif = " << tDif << "ms" << std::endl;
result is 1.302 ms.
measured bandwidth should be
1000000.0 * 4 / 1024 / 1024 / 1024 / (1.302 / 1000) = 2.86 GB/s
The max bandwidth should be much higher than this. Why is memcpy so low? Could anyone help? Thanks
Thanks. Yes it seems not to be a good benchmark test.
(BTW, float memcpy and int memcpy results are the same (which should be no surprise))