Hi,
I'm Trying to convert a code written in Cuda to openCL and run into some trouble. My final goal is to implement the code on an Odroid XU3 board with a Mali T628 GPU.
In order to simplify the transition and save time trying to debug openCL kernels I've taken the following steps:
I know that different architectures may have different optimizations but that isn't my main concern for now. I manged to run the openCL code on my Nvidia GPU with no apparent issues but keep getting strange errors when trying to run the code on the Odroid board. I know that different architectures have different handling of exceptions etc. but I'm not sure how to solve those issues.
Since the openCL code works great on my Nvidia I assume that I managed to do the correct transition between thread/blocks -> workItems/workGroups etc. I already fixed several issues that relate to the cl_device_max_work_group_size issue so that can't be the cause.When running the code i'm getting a "CL_OUT_OF_RESOURCES" error.
I've narrowed the cause of the error to 2 lines in the code but not sure to fix those issues.
the error is caused by the following lines in the kernel code attached :
Is there any tool that can help debugging those issues on the Odroid ? I saw that using "printf" inside the kernel isn't possible. Is there another available command ?
Thanks
Yuval
GpuVerify is a static analysis tool you don't need to run it on the device,you can run it on your pc.
Regarding the workgroup size : even if the hardware supports up to 256 local workgroup size, we strongly recommend to limit it to 128.
If your kernel is fairly complex you will have to set an attribute in front of your kernel to give the compiler a hint : see http://community.arm.com/message/28323#28323
Hope this helps,
Anthony
Thanks for all the help. I added the __attribute__ and reviewed the kernel with GPUVerify and the issue is solved.