This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Mali-T628(Samsung Exynos Octa 5420 Board) GPU Cache issue in kernel

Hi,

I am facing Cache issue in Mali GPU do you have any idea how to resolve it. I will explain the problem clearly.

We are working on Samsung Exynos Octa 5420 Board, we have one algorithm to be ported to GPU.

1. First we thought of having  separate GPU buffers(Created by using "clCreateBuffer" and "CL_MEM_ALLOC_HOST_PTR") where we have to copy the input data from CPU global buffer(Created by using malloc) to GPU buffer since it is separate GPU buffer data we are arranging without any gaps, Example is if 1st thread is operation on 1st block of data 2nd thread or any other thread may work on 2nd block of data which is located just after 1st block of data. Here in this design GPU algorithm numbers are fine with in the range.

2. In above design We observed that copying is taking huge time so we decided to create CPU global buffer with "clCreateBuffer" and "CL_MEM_ALLOC_HOST_PTR". So that by mapping(using clEnqueueMapBuffer(CPU Buffer)) we can use this buffer on CPU and GPU also. But this buffer data is arranged in such a way that data required by GPU algorithm will be arranged at different position, example if 1st thread is operation on 1st block of data 2nd thread or any other thread may work on 2nd block of data which is located not exactly beside to 1st block. We are observing performance drop of nearly 95% compared to earlier algorithm (1st design is taking 41 m sec  2nd design is taking 79 m sec). Can you suggest any way to avoid the Cache issue, quicker response will be very much helpful.

Thanks & Regards,

Narendra Kumar

Parents Reply Children