Arm Development Platforms forum memcpy slowness on Kirin 985

State Accepted Answer
Locked Locked
Replies 2 replies
Subscribers 22 subscribers
Views 7196 views
Users 0 members are here

Options

Related

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

memcpy slowness on Kirin 985

qdev over 5 years ago

Use the following code to profile memory bandwidth for Kirin 985:

// initialize

std::vector<float> vecSrc(1000000, 2);

std::vector<float> vecDst(1000000, 3);

// memcpy profile

auto tStart = std::chrono::high_resolution_clock::now();

memcpy(vecDst.data(), vecSrc.data(), vecSrc.size() * sizeof(float));

auto tEnd = std::chrono::high_resolution_clock::now();

// calculate time

std::cout << "vecDst[999999] = " << vecDst[999999] << std::endl;

float tDif = std::chrono::duration_cast<std::chrono::microseconds>(tEnd - tStart).count() / 1000.f;

std::cout << "tDif = " << tDif << "ms" << std::endl;

result is 1.302 ms.

measured bandwidth should be

1000000.0 * 4 / 1024 / 1024 / 1024 / (1.302 / 1000) = 2.86 GB/s

The max bandwidth should be much higher than this. Why is memcpy so low? Could anyone help? Thanks

Top replies

Zhifei Yang over 5 years ago +1 verified

The cache utilization may not be good. 1) when your benchmark application is running, the userspace program is interrupted by other kernel space programs frequently 2) your benchmark code is not...

+1 Zhifei Yang over 5 years ago

The cache utilization may not be good.

1) when your benchmark application is running, the userspace program is interrupted by other kernel space programs frequently

2) your benchmark code is not designed to be cacheline friendly. Your cache miss rate may be very high.

3) It's interesting to do memcpy for the float vectors. Why not to benchmark the integer arrays or char arrays?

For serious benchmark, please use the bare-mental code for testing.
Cancel
Vote up +1 Vote down

Cancel
0 qdev over 5 years ago in reply to Zhifei Yang

Thanks. Yes it seems not to be a good benchmark test.

(BTW, float memcpy and int memcpy results are the same (which should be no surprise))
Cancel
Vote up 0 Vote down

Cancel