This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

openCL CL_OUT_OF_RESOURCES issue

Hi,

I'm Trying to convert a code written in Cuda to openCL and run into some trouble. My final goal is to implement the code on an Odroid XU3 board with a Mali T628 GPU.

In order to simplify the transition and save time trying to debug openCL kernels I've taken the following steps:

  1. Implement the code in Cuda and test it on a Nvidia GeForce 760
  2. Implement the code in openCL and test it on a Nvidia GeForce 760
  3. test the openCL code on an Odroid XU3 board with a Mali T628 GPU.

I know that different architectures may have different optimizations but that isn't my main concern for now. I manged to run the openCL code on my Nvidia GPU with no apparent issues but keep getting strange errors when trying to run the code on the Odroid board. I know that different architectures have different handling of exceptions etc. but I'm not sure how to solve those issues.


Since the openCL code works great on my Nvidia I assume that I managed to do the correct transition between thread/blocks -> workItems/workGroups etc. I already fixed several issues that relate to the cl_device_max_work_group_size issue so that can't be the cause.When running the code i'm getting a "CL_OUT_OF_RESOURCES" error.


I've narrowed the cause of the error to 2 lines in the code but not sure to fix those issues.

the error is caused by the following lines in the kernel code attached :

  1. lowestDist[pixelNum] = partialDiffSumTemp; both variables are private variables of the kernel and therefor I don't see any potential issue.
  2. d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 0] = bestDisparity[0]; Here I guess the cause is "OUT_OF_BOUND" but not sure how to debug it since the original code doesn't have any issue.

Is there any tool that can help debugging those issues on the Odroid ? I saw that using "printf" inside the kernel isn't possible. Is there another available command ?


Thanks

Yuval

stereoKernel.cl.zip
Parents
  • Hi goldfracht,

    You can use printf inside your kernel but you need to enable this extension : https://www.khronos.org/registry/cl/extensions/arm/cl_arm_printf.txt

    /* Define a printf callback function. */
    static void printf_callback( const char *buffer, size_t len, size_t complete, void *user_data )
    {
      printf( "%.*s", (int)len, buffer );
    }
    
    void cl_init()
    {
    [...]
     cl_context_properties props[] = 
      {
      /* Enable a printf callback function for this context.  
      * */
      CL_PRINTF_CALLBACK_ARM,   (cl_context_properties) printf_callback,
    
      /* Request a minimum printf buffer size of 4MiB for devices in the
      context that support this extension. */
      CL_PRINTF_BUFFERSIZE_ARM, (cl_context_properties) 0x100000,
      CL_CONTEXT_PLATFORM,      (cl_context_properties) platform,
      0
      };
    [...]
    }
    

    And in your kernel you need to add:

    #pragma OPENCL EXTENSION cl_arm_printf : enable

    For out of bounds checks and race conditions you can try to use GPUVerify - Download

    Note: The online version doesn't currently support out of bounds checking.

    Here is how to use it:

    test.cl:

    __kernel void foo (__global const float* src_a, int d,
                         __global const int* src_b, __global float *c)
    
    
    {
            uint idx = get_global_id(0);
    
    
            float4 a, b;
    
    
            a = (float4)(src_a[d]);
            b = convert_float4(vload4(idx, src_b));
            vstore4(a + b, idx, c);
    }
    

    ./gpuverify --global_size=[256] --local_size=[256] --kernel-arrays=foo,8,8,8 test.cl --check-array-bounds

    test.opt.bc: warning: Assuming the arguments 'src_a', 'src_b', 'c' of 'foo' on line 1 of test.cl to be non-aliased; please consider adding a restrict qualifier to these arguments

    test.cl:9:2: error: possible array out-of-bounds access on array src_a[-1610612736] by work item 2 in work group 0:

      a = (float4)(src_a[d]);

    Bitwise values of parameters of 'foo':

      d = 2684354560

    test.cl:9:2: error: possible array out-of-bounds access on array src_a[536870912] by work item 2 in work group 0:

      a = (float4)(src_a[d]);

    Bitwise values of parameters of 'foo':

      d = 536870912

    test.cl:10:21: error: possible array out-of-bounds access on array src_b[512] by work item 128 in work group 0:

      b = convert_float4(vload4(idx, src_b));

    Bitwise values of parameters of 'foo':

      d = 0

    test.cl:10:21: error: possible array out-of-bounds access on array src_b[2] by work item 0 in work group 0:

      b = convert_float4(vload4(idx, src_b));

    Bitwise values of parameters of 'foo':

      d = 0

    GPUVerify kernel analyser finished with 0 verified, 4 errors

    What calls returns CL_OUT_OF_RESOURCES ? If it's clEnqueueNDRangeKernel: what is the local workgroup size parameter ?

    Hope this helps,

    Anthony

Reply
  • Hi goldfracht,

    You can use printf inside your kernel but you need to enable this extension : https://www.khronos.org/registry/cl/extensions/arm/cl_arm_printf.txt

    /* Define a printf callback function. */
    static void printf_callback( const char *buffer, size_t len, size_t complete, void *user_data )
    {
      printf( "%.*s", (int)len, buffer );
    }
    
    void cl_init()
    {
    [...]
     cl_context_properties props[] = 
      {
      /* Enable a printf callback function for this context.  
      * */
      CL_PRINTF_CALLBACK_ARM,   (cl_context_properties) printf_callback,
    
      /* Request a minimum printf buffer size of 4MiB for devices in the
      context that support this extension. */
      CL_PRINTF_BUFFERSIZE_ARM, (cl_context_properties) 0x100000,
      CL_CONTEXT_PLATFORM,      (cl_context_properties) platform,
      0
      };
    [...]
    }
    

    And in your kernel you need to add:

    #pragma OPENCL EXTENSION cl_arm_printf : enable

    For out of bounds checks and race conditions you can try to use GPUVerify - Download

    Note: The online version doesn't currently support out of bounds checking.

    Here is how to use it:

    test.cl:

    __kernel void foo (__global const float* src_a, int d,
                         __global const int* src_b, __global float *c)
    
    
    {
            uint idx = get_global_id(0);
    
    
            float4 a, b;
    
    
            a = (float4)(src_a[d]);
            b = convert_float4(vload4(idx, src_b));
            vstore4(a + b, idx, c);
    }
    

    ./gpuverify --global_size=[256] --local_size=[256] --kernel-arrays=foo,8,8,8 test.cl --check-array-bounds

    test.opt.bc: warning: Assuming the arguments 'src_a', 'src_b', 'c' of 'foo' on line 1 of test.cl to be non-aliased; please consider adding a restrict qualifier to these arguments

    test.cl:9:2: error: possible array out-of-bounds access on array src_a[-1610612736] by work item 2 in work group 0:

      a = (float4)(src_a[d]);

    Bitwise values of parameters of 'foo':

      d = 2684354560

    test.cl:9:2: error: possible array out-of-bounds access on array src_a[536870912] by work item 2 in work group 0:

      a = (float4)(src_a[d]);

    Bitwise values of parameters of 'foo':

      d = 536870912

    test.cl:10:21: error: possible array out-of-bounds access on array src_b[512] by work item 128 in work group 0:

      b = convert_float4(vload4(idx, src_b));

    Bitwise values of parameters of 'foo':

      d = 0

    test.cl:10:21: error: possible array out-of-bounds access on array src_b[2] by work item 0 in work group 0:

      b = convert_float4(vload4(idx, src_b));

    Bitwise values of parameters of 'foo':

      d = 0

    GPUVerify kernel analyser finished with 0 verified, 4 errors

    What calls returns CL_OUT_OF_RESOURCES ? If it's clEnqueueNDRangeKernel: what is the local workgroup size parameter ?

    Hope this helps,

    Anthony

Children