I am running a simple program on my Mali T628 to test the GPU performance versus CPU.
I developed a code to add two integer arrays and it works fine. However, when I start increasing the array size up to 16384 my kernel doesn't execute right and I get and error when I apply clWaitForEvents(). and clGetEventInfo which says CL_INVALID_VALUE.
My global work size is 16384,
My local work-group size is 256 (know by calling clGetKernelWorkGroupInfo())
I check other specifications like memory and cache and it should work. The compiler is supposed to split the global work size and execute it in several steps.
Any idea of the problem?
Thanks