How best to maximize cache-write utilization for gpu-compute?

What are some best practices for preventing data from being written out to RAM when structuring a compute job on the GPU that requires a small amount of data? For example, if I wanted to do 10M read/write operations on a contiguous 1024B array and finally output, say, 1024B, would this be automatically cached or are there things that should be done to make caching more likely?

Parents Reply Children
No data
More questions in this forum