I'am trying use opencl to replace some matrix multiplication and vector compute. but the gpu always slower cpu 2--4 times.
In cpu we use neon simd,in gpu i also use vector like float4,float16 .
I do those test on MT6753(ARM-A53 @ 1.5GHz,Mali T720)
the data types are all float .
who have do same jobs on Mali gpu,I wonder is there any optimize possble?
I wish GPU would fast than CPU. If you done some jobs which fast than CPU on GPU please help me.
I need some base data between mali cpu and gpu compute performance eg : GFLOPS and so on
Fist I map the memobj to cpu,and write data in it, then unmap the memobj,run opencl kernel ,after kernel finish ,map memobj,read result.
the memobj created by CL_MEM_ALLOC_HOST_PTR, I also think maybe I can use fillbuffer ,read buffer and write buffer.
I know I can do some optimize :
1. choose a better memobj create way and read/write or copy memobj
2. rewrite the kernel function
please give some basic performance data,I really don't know how fast I can get in GPU.If you have some experience on this please help me.
github.com/.../ComputeLibrary