Support forums

High Performance Computing (HPC) forum Memory Access Optimization for OpenCL Programs Running on Mali GPU

State Suggested Answer
Locked Locked
Replies 1 reply
Answers 1 answer
Subscribers 26 subscribers
Views 1920 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Memory Access Optimization for OpenCL Programs Running on Mali GPU

ZhuangJt over 1 year ago

What is the most efficient memory access method when I run my OpenCL program on the Mali GPU, what should be the memory access order for different cores and threads, and is there any relevant documentation to explain it.

for example, The Mali G710 GPU has 10 cores, with a maximum thread count of 2048 or 1024 per core. When I set the local work size in opencl to {16,8}, it means that each core only uses 128 threads. When I adjust the local work size to {32,8}, it means that each core only uses 256 threads, which should have a higher throughput rate, but the actual results are the opposite; Can anyone explain this phenomenon?

Top replies

Peter Harris over 1 year ago +1 suggested

Old thread, but to answer this one ... The workgroup size does not determine thread occupancy. Shader cores can run multiple work groups, so in both cases you should be able to use all thread slots based...