I'am trying use opencl to replace some matrix multiplication and vector compute. but the gpu always slower cpu 2--4 times.
In cpu we use neon simd,in gpu i also use vector like float4,float16 .
I do those test on MT6753(ARM-A53 @ 1.5GHz,Mali T720)
the data types are all float .
who have do same jobs on Mali gpu,I wonder is there any optimize possble?
I wish GPU would fast than CPU. If you done some jobs which fast than CPU on GPU please help me.
I need some base data between mali cpu and gpu compute performance eg : GFLOPS and so on
Fist I map the memobj to cpu,and write data in it, then unmap the memobj,run opencl kernel ,after kernel finish ,map memobj,read result.
the memobj created by CL_MEM_ALLOC_HOST_PTR, I also think maybe I can use fillbuffer ,read buffer and write buffer.
I know I can do some optimize :
1. choose a better memobj create way and read/write or copy memobj
2. rewrite the kernel function
please give some basic performance data,I really don't know how fast I can get in GPU.If you have some experience on this please help me.
Without knowing precisely what you are trying to do, it is hard to help here.
Your chipset has an 8-core Cortex-A53 at 1.5GHz, but only a 4-core Mali-T720 at 500MHz, so you have 12GHz of total CPU performance, but only 2GHz of total GPU performance, so it wouldn't surprise me if this system is faster on the CPU for multi-threaded software. Note that the Mali-T720 is designed for casual gaming and efficient UI rendering, not peak arithmetic performance; its bigger brother, the Mali-T760, has ~2x the arithmetic performance per clock.
The GPU will benefit if you can use narrower data inputs such as fp16 inputs, as the GPU supports double throughput for fp16 compared to fp32.