This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali OpenCL 2.x SDK - When is it expected from ARM ?

Hi All,

Do we have some idea as to when would ARM release SDK for later versions of OpenCL ( 2.x) ?

currently we see only OpenCL 1.1 SDK available.

Regards

Paul

Parents
  • Hi Anthony,

    Thanks for your response.

    While working on an OpenCL acceleration project for a Video decoder on  Firefly RK3288 Development board(Cortex A-17 / MALI  T-764) and Android, we discovered that OpenCL's kernel launching API clEnqueueNDRangeKernel has got huge overheads. Our test kernel contains an indexed integer addition of two buffers (512 bytes) with results stored in a third buffer. The same kernel was launched twice (repeated calls to measure if the second call had some advantages).

    OpenCL ARM profiling:

                                    Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)

    First Launch:         30,12,55,724            3,65,38,31,696             30,12,55,724                          3,65,39,77,696

    Second Launch:    30,12,55,724            3,65,59,68,696             30,12,55,724                          3,65,60,81,696

    The board uses ARM OpenCL 1.1 Drivers. From the above figures, what seems surprising or erratic is 'Queued Time' and 'Execution start time' returned the same value.

    Also there is a big time gap between 'Queued Time' and 'Submit Time to Device' (huge overhead).

    While the same kernel launched by same OpenCL host code on Intel XEON desktop gave the following results.....

    OpenCL Intel XEON Profiling:

                                Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)

    First Launch:           3,80,34,22,080           3,80,34,38,720           3,80,36,53,504                       3,80,36,66,432

           Second Launch:      3,80,46,77,760           3,80,47,02,336           3,80,47,23,936                       3,80,47,27,936

    In this case, we can see all the timings seems to be normal.

    Our question:

    1. What is that we are missing ? Is it that our OpenCL Drivers on the board are old ? Or in general, such values are normal with MALI OpenCL.

    2. There is a new feature in OpenCL 2.x - 'Device side Kernel En-queue (nested parallelism)'. Could this be a workaround to avoid the Host Kernel Launch overheads ?

    3. Any Other suggestions..

    Can you please help ?

    Regards

    Paul

Reply
  • Hi Anthony,

    Thanks for your response.

    While working on an OpenCL acceleration project for a Video decoder on  Firefly RK3288 Development board(Cortex A-17 / MALI  T-764) and Android, we discovered that OpenCL's kernel launching API clEnqueueNDRangeKernel has got huge overheads. Our test kernel contains an indexed integer addition of two buffers (512 bytes) with results stored in a third buffer. The same kernel was launched twice (repeated calls to measure if the second call had some advantages).

    OpenCL ARM profiling:

                                    Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)

    First Launch:         30,12,55,724            3,65,38,31,696             30,12,55,724                          3,65,39,77,696

    Second Launch:    30,12,55,724            3,65,59,68,696             30,12,55,724                          3,65,60,81,696

    The board uses ARM OpenCL 1.1 Drivers. From the above figures, what seems surprising or erratic is 'Queued Time' and 'Execution start time' returned the same value.

    Also there is a big time gap between 'Queued Time' and 'Submit Time to Device' (huge overhead).

    While the same kernel launched by same OpenCL host code on Intel XEON desktop gave the following results.....

    OpenCL Intel XEON Profiling:

                                Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)

    First Launch:           3,80,34,22,080           3,80,34,38,720           3,80,36,53,504                       3,80,36,66,432

           Second Launch:      3,80,46,77,760           3,80,47,02,336           3,80,47,23,936                       3,80,47,27,936

    In this case, we can see all the timings seems to be normal.

    Our question:

    1. What is that we are missing ? Is it that our OpenCL Drivers on the board are old ? Or in general, such values are normal with MALI OpenCL.

    2. There is a new feature in OpenCL 2.x - 'Device side Kernel En-queue (nested parallelism)'. Could this be a workaround to avoid the Host Kernel Launch overheads ?

    3. Any Other suggestions..

    Can you please help ?

    Regards

    Paul

Children
  • Hi Paul,

    I'm sorry, I don't understand what units the numbers in your tables are in ? (Why are they packed by groups of 2 digits ?)

    Also, I assume the times for each column come from clGetEventProfilingInfo and therefore correspond to CL_PROFILING_COMMAND_QUEUED, CL_PROFILING_COMMAND_SUBMIT, CL_PROFILING_COMMAND_START, CL_PROFILING_COMMAND_END.

    If that's correct then command_start and command_end are the timestamps for when the job was executed on the GPU, queues is when clEnqueue was called and submit is when the queue was flushed by the application.

    None of those allow you to draw any conclusion about the time it takes to enqueue a kernel.

    Regards,

    Anthony