This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CPU to GPU Copying Speed tuning.

Hi,

Is there any way to speed the data copying from CPU buffers which are allocated using "malloc" to GPU accessible memory. currently I am using simple memcpy for copying data.

Thanks & Regards,

Narendra Kumar Chepuri.

Parents
  • Hi Narendra,

    It's important to have the data you're using contiguous, in order to optimise the use of the cache and also because GPUs load 128 bits of data at the time, therefore if you access sparse data you're wasting a lot of the bandwidth.

    If possible try to switch from an array of structure to a structure of arrays for your data organisation, it should help.

    Thanks,

    Anthony

Reply
  • Hi Narendra,

    It's important to have the data you're using contiguous, in order to optimise the use of the cache and also because GPUs load 128 bits of data at the time, therefore if you access sparse data you're wasting a lot of the bandwidth.

    If possible try to switch from an array of structure to a structure of arrays for your data organisation, it should help.

    Thanks,

    Anthony

Children
  • Hi Anthony,

                            Thanks for your response but here I am not using any structures for storing input data, so for solving this Cache issues is there any other way,

    Note: I am just using source pointer as argument using globalid  in kernel I will access the source data which is at particular point.

    Thanks & Regards,

    Narendra Kumar.