Hi,
Is there any way to speed the data copying from CPU buffers which are allocated using "malloc" to GPU accessible memory. currently I am using simple memcpy for copying data.
Thanks & Regards,
Narendra Kumar Chepuri.
Hi Narendra,
It's important to have the data you're using contiguous, in order to optimise the use of the cache and also because GPUs load 128 bits of data at the time, therefore if you access sparse data you're wasting a lot of the bandwidth.
If possible try to switch from an array of structure to a structure of arrays for your data organisation, it should help.
Thanks,
Anthony
Hi Anthony,
Thanks for your response but here I am not using any structures for storing input data, so for solving this Cache issues is there any other way,
Note: I am just using source pointer as argument using globalid in kernel I will access the source data which is at particular point.
Narendra Kumar.