Support forums

Mobile, Graphics, and Gaming forum Why do i get clEnqueueMapBuffer() performance hit?

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 1 reply
Subscribers 137 subscribers
Views 5002 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why do i get clEnqueueMapBuffer() performance hit?

Mikael over 9 years ago

I'm currently porting vision algorithms to OpenCL that is specifically target for the Mali T800 gpus. For this particular problem I'm running on the T-880 series.

I have several contiguous buffers of sizes 512x512 * (1, 2 and 4).

After the four clEnqueueNDRangeKernel() calls, I want to read the final result using clEnqueueMapBuffer().

What is weird is that if I increase the size of the output buffer and map / unmap it using clEnqueueMapBuffer() I get a severe performance hit.

Basically, I have a 1 ms total execution time that goes up to 6 ms if I increase the size of the output buffer by a factor of 10.

OBSERVE: nothing has changed between the two situations except the size of the output buffer.

The last kernel is pushing the results onto the output buffer using:

output[atom_inc(output_index)] = result;

I don't know how many elements that will end up in the output buffer except that I have a upper limit, which is why I want to have a relatively big output buffer. Say 512x512 * 8 bytes.

Have anyone encounted something similar? And what could be the cause of this?