Hi,
It is a question i asked on khronos forum but i had no answer. So i try to ask the question here.
When i try to use 64bit libraries to build my APK i always got the same error on my booth phonne hauwei honnor play and xiaomi 13T pro.
If someone could explain me why.
int w, int h, int ligthI avoid to use long as much as possible.i did not anderstoud:Given that this MUST be a 32-bit type for the template to match the OpenCL kernel type size, I suggest using the int32_t type from the stdint header, to avoid the variable type size problems.
hi,
i replaced the int by int32_t for the w,h and ligth variables. But still the same problem.
i tried to replace the int to int32_t in the kernel but the kernel compilation failed with (error: unknown type name 'int32_t')
I cannot add the stdint.h file in the .cl kernel file, i got file not found. So i copied the stdint.h, stddef.h and compiler.h to the kernel and use the int32_t to replace the int. But i still got the same error CL_INVALID_ARG_SIZE.
So the problem should come from the cl::Buffer which has the wrong size ?
I have found the problem after few weeks of interrogation.
The problem is because i use kernel.setArg(x,0)
it look like using 64bit it does not like the set to 0 in setArg.
But i got no idea why this append. So, it would be nice if some one could explain m why such a Bug. It is comming from the compiler, the driver or something else.
I tried to replace 0 by NULL but same error. What could i do if i want to keep my kernel without cange ?
And let me no if there is no solution except modifiyng the kernel without setARG with 0.
Regards.
PS: i got no improvment in speed and some kernel that used range(16,16) need to be set to (2,2). Strange, i got the same problem with the xiaomi 13T pro. Hauwei seems to be the best even with 7 years old. How can thing be worse with time ?
i can get between 39 and 45ms with the hauwei by setting the batterie to performance, so 30% improvement and 22 frame seconde for big GPU and CPU traitement.
The conclusion is that 32bit give must better result than 64bit. Not surprising. That is for android phonne, i do not know for other platform. So i will keep working in 32bit. it is more efficient.
new code :
//gNV21Kernel.setArg(0,0); // debug//gNV21Kernel.setArg(1,0); // debugIdx//gNV21Kernel.setArg(0,0); // GRAYgNV21Kernel.setArg(0,bufferRGBA); // output RGBA//gNV21Kernel.setArg(2,0); // HSV//gNV21Kernel.setArg(4,0); // HSL//gNV21Kernel.setArg(5,0); // HSI//gNV21Kernel.setArg(6,0); // MMMgNV21Kernel.setArg(1,bufferNV21); // input NV21gNV21Kernel.setArg(2,w);gNV21Kernel.setArg(3,h);gNV21Kernel.setArg(4,ligth);gQueue.enqueueNDRangeKernel(gNV21Kernel, cl::NDRange(448,28), cl::NDRange(1024,1024), cl::NDRange(2,2),0,&arraySecondEvent);new kernel :
__kernel void nv21torgba( //__global int* debug, //__global int* debugIdx, //__global uchar4* Gray, __global uchar4* RGBA, //__global uchar4* HSV, //__global uchar4* HSL, //__global uchar4* HSI, //__global uchar4* MMM, __global uchar* in, int im_width, int im_height, int im_ligth)
Thank you for coming back to share the solution you found :)