Hi,
It is a question i asked on khronos forum but i had no answer. So i try to ask the question here.
When i try to use 64bit libraries to build my APK i always got the same error on my booth phonne hauwei honnor play and xiaomi 13T pro.
If someone could explain me why.
that is what i do:
declare buffer :
static cl::Buffer bufferNV21;static cl::Buffer bufferRGBA;init buffer:
bufferligne = cl::Buffer(gContext, CL_MEM_USE_HOST_PTR, (1024*1024)*sizeof(cl_uchar4), buf, NULL);bufferRGBA = cl::Buffer(gContext, CL_MEM_READ_WRITE , (1080*1920)*sizeof(cl_uchar4));feed buufer: this is workingin => uint8_t* inbufferNV21 = cl::Buffer(gContext, CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR , isize*sizeof(cl_uchar), in, NULL);
kernel :
__kernel void nv21torgba( __global int* debug, __global int* debugIdx, __global uchar4* Gray, __global uchar4* RGBA, __global uchar4* HSV, __global uchar4* HSL, __global uchar4* HSI, __global uchar4* MMM, __global uchar* in, int im_width, int im_height, int im_ligth)
and the enqueueNDRangeKernel how cause the problem: (they all cause problem)
gNV21Kernel.setArg(0,0); // debuggNV21Kernel.setArg(1,0); // debugIdxgNV21Kernel.setArg(2,0); // GRAYgNV21Kernel.setArg(3,bufferRGBA); // output RGBAgNV21Kernel.setArg(4,0); // HSVgNV21Kernel.setArg(5,0); // HSLgNV21Kernel.setArg(6,0); // HSIgNV21Kernel.setArg(7,0); // MMMgNV21Kernel.setArg(8,bufferNV21); // input NV21gNV21Kernel.setArg(9,w);gNV21Kernel.setArg(10,h);gNV21Kernel.setArg(11,ligth);gQueue.enqueueNDRangeKernel(gNV21Kernel, cl::NDRange(448,28), cl::NDRange(1024,1024), cl::NDRange(2,2),0,&arraySecondEvent);
Has you see i use the C++ version and not C. The input data are in uchar and the output in uchar4.Everything work well in 32bit. So it is a type problem, but i never find neither anderstoud where come from this error.
And i am using the last CL include from Khronos.I think that the problem could come from the kernel compilation. And this is a C compilation that could be compiled with C++ compiler.It is just an idéa that i cannot confirm bacause with android a do not know which compiler is used.
What CPU-side data type are you using for "w", "h", and "ligth" in code below?
gNV21Kernel.setArg(9,w);gNV21Kernel.setArg(10,h);gNV21Kernel.setArg(11,ligth);
... my best guess is that you have these as a "long", so the C++ template in the header is matching the type size as 64-bits on a the 64-bit system, but 32-bits on the 32-bit system.
Given that this MUST be a 32-bit type for the template to match the OpenCL kernel type size, I suggest using the int32_t type from the stdint header, to avoid the variable type size problems.
int w, int h, int ligthI avoid to use long as much as possible.i did not anderstoud:Given that this MUST be a 32-bit type for the template to match the OpenCL kernel type size, I suggest using the int32_t type from the stdint header, to avoid the variable type size problems.
hi,
i replaced the int by int32_t for the w,h and ligth variables. But still the same problem.
i tried to replace the int to int32_t in the kernel but the kernel compilation failed with (error: unknown type name 'int32_t')
I cannot add the stdint.h file in the .cl kernel file, i got file not found. So i copied the stdint.h, stddef.h and compiler.h to the kernel and use the int32_t to replace the int. But i still got the same error CL_INVALID_ARG_SIZE.
So the problem should come from the cl::Buffer which has the wrong size ?
I have found the problem after few weeks of interrogation.
The problem is because i use kernel.setArg(x,0)
it look like using 64bit it does not like the set to 0 in setArg.
But i got no idea why this append. So, it would be nice if some one could explain m why such a Bug. It is comming from the compiler, the driver or something else.
I tried to replace 0 by NULL but same error. What could i do if i want to keep my kernel without cange ?
And let me no if there is no solution except modifiyng the kernel without setARG with 0.
Regards.
PS: i got no improvment in speed and some kernel that used range(16,16) need to be set to (2,2). Strange, i got the same problem with the xiaomi 13T pro. Hauwei seems to be the best even with 7 years old. How can thing be worse with time ?
i can get between 39 and 45ms with the hauwei by setting the batterie to performance, so 30% improvement and 22 frame seconde for big GPU and CPU traitement.
The conclusion is that 32bit give must better result than 64bit. Not surprising. That is for android phonne, i do not know for other platform. So i will keep working in 32bit. it is more efficient.
new code :
//gNV21Kernel.setArg(0,0); // debug//gNV21Kernel.setArg(1,0); // debugIdx//gNV21Kernel.setArg(0,0); // GRAYgNV21Kernel.setArg(0,bufferRGBA); // output RGBA//gNV21Kernel.setArg(2,0); // HSV//gNV21Kernel.setArg(4,0); // HSL//gNV21Kernel.setArg(5,0); // HSI//gNV21Kernel.setArg(6,0); // MMMgNV21Kernel.setArg(1,bufferNV21); // input NV21gNV21Kernel.setArg(2,w);gNV21Kernel.setArg(3,h);gNV21Kernel.setArg(4,ligth);gQueue.enqueueNDRangeKernel(gNV21Kernel, cl::NDRange(448,28), cl::NDRange(1024,1024), cl::NDRange(2,2),0,&arraySecondEvent);new kernel :
__kernel void nv21torgba( //__global int* debug, //__global int* debugIdx, //__global uchar4* Gray, __global uchar4* RGBA, //__global uchar4* HSV, //__global uchar4* HSL, //__global uchar4* HSI, //__global uchar4* MMM, __global uchar* in, int im_width, int im_height, int im_ligth)
Thank you for coming back to share the solution you found :)