Hi,
I am facing Cache issue in Mali GPU do you have any idea how to resolve it. I will explain the problem clearly.
We are working on Samsung Exynos Octa 5420 Board, we have one algorithm to be ported to GPU.
1. First we thought of having separate GPU buffers(Created by using "clCreateBuffer" and "CL_MEM_ALLOC_HOST_PTR") where we have to copy the input data from CPU global buffer(Created by using malloc) to GPU buffer since it is separate GPU buffer data we are arranging without any gaps, Example is if 1st thread is operation on 1st block of data 2nd thread or any other thread may work on 2nd block of data which is located just after 1st block of data. Here in this design GPU algorithm numbers are fine with in the range.
2. In above design We observed that copying is taking huge time so we decided to create CPU global buffer with "clCreateBuffer" and "CL_MEM_ALLOC_HOST_PTR". So that by mapping(using clEnqueueMapBuffer(CPU Buffer)) we can use this buffer on CPU and GPU also. But this buffer data is arranged in such a way that data required by GPU algorithm will be arranged at different position, example if 1st thread is operation on 1st block of data 2nd thread or any other thread may work on 2nd block of data which is located not exactly beside to 1st block. We are observing performance drop of nearly 95% compared to earlier algorithm (1st design is taking 41 m sec 2nd design is taking 79 m sec). Can you suggest any way to avoid the Cache issue, quicker response will be very much helpful.
Thanks & Regards,
Narendra Kumar
Hi Narendra Kumar,
I think I understand this more clearly now, thankyou.
It sounds like it could be a cache issue, but without looking at the application it is very difficult to comment definitively. My suggestion is to look into using DS5 Streamline as I suggested before. This will allow you to clearly track performance counters from CPU and GPU and will show where the bottlenecks are.
Alternatively you could experiment with workgroup size within your kernels. This can be a useful way to influence memory access patterns and would be a good indicator that cache maintenance is causing this problem.
I'm sorry I can't be more definitive, but do post back any other observations and we'll see if that helps identify potential solutions.
HTH, Tim
Hi Tim,
I am not allowed to share the code but I am working to create source code with same kind of functionality which has same behavior as I mentioned in the issue, I will share the code within 2 days.
Narendra Kumar.
Hi Narendra,
Thankyou. For IP reasons we also would not want to receive actual source code from your application, so that's good. A reproducer like you describe would be great though and could be very helpful.
Regards, Tim