Hi,
I'm Trying to convert a code written in Cuda to openCL and run into some trouble. My final goal is to implement the code on an Odroid XU3 board with a Mali T628 GPU.
In order to simplify the transition and save time trying to debug openCL kernels I've taken the following steps:
I know that different architectures may have different optimizations but that isn't my main concern for now. I manged to run the openCL code on my Nvidia GPU with no apparent issues but keep getting strange errors when trying to run the code on the Odroid board. I know that different architectures have different handling of exceptions etc. but I'm not sure how to solve those issues.
Since the openCL code works great on my Nvidia I assume that I managed to do the correct transition between thread/blocks -> workItems/workGroups etc. I already fixed several issues that relate to the cl_device_max_work_group_size issue so that can't be the cause.When running the code i'm getting a "CL_OUT_OF_RESOURCES" error.
I've narrowed the cause of the error to 2 lines in the code but not sure to fix those issues.
the error is caused by the following lines in the kernel code attached :
Is there any tool that can help debugging those issues on the Odroid ? I saw that using "printf" inside the kernel isn't possible. Is there another available command ?
Thanks
Yuval
Thanks for all the help. I added the __attribute__ and reviewed the kernel with GPUVerify and the issue is solved.