This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali OpenCL 2.x SDK - When is it expected from ARM ?

Paul over 8 years ago

Hi All,

Do we have some idea as to when would ARM release SDK for later versions of OpenCL ( 2.x) ?

currently we see only OpenCL 1.1 SDK available.

Regards

Paul

Parents

0 Paul over 8 years ago in reply to Anthony Barbier

Hi Anthony,
Thanks for your response.
While working on an OpenCL acceleration project for a Video decoder on Firefly RK3288 Development board(Cortex A-17 / MALI T-764) and Android, we discovered that OpenCL's kernel launching API clEnqueueNDRangeKernel has got huge overheads. Our test kernel contains an indexed integer addition of two buffers (512 bytes) with results stored in a third buffer. The same kernel was launched twice (repeated calls to measure if the second call had some advantages).
OpenCL ARM profiling:
                                Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)
First Launch:         30,12,55,724            3,65,38,31,696             30,12,55,724                          3,65,39,77,696
Second Launch:    30,12,55,724            3,65,59,68,696             30,12,55,724                          3,65,60,81,696
The board uses ARM OpenCL 1.1 Drivers. From the above figures, what seems surprising or erratic is 'Queued Time' and 'Execution start time' returned the same value.
Also there is a big time gap between 'Queued Time' and 'Submit Time to Device' (huge overhead).
While the same kernel launched by same OpenCL host code on Intel XEON desktop gave the following results.....
OpenCL Intel XEON Profiling:
                            Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)
First Launch:           3,80,34,22,080           3,80,34,38,720           3,80,36,53,504                       3,80,36,66,432
       Second Launch:      3,80,46,77,760           3,80,47,02,336           3,80,47,23,936                       3,80,47,27,936
In this case, we can see all the timings seems to be normal.
Our question:
1. What is that we are missing ? Is it that our OpenCL Drivers on the board are old ? Or in general, such values are normal with MALI OpenCL.
2. There is a new feature in OpenCL 2.x - 'Device side Kernel En-queue (nested parallelism)'. Could this be a workaround to avoid the Host Kernel Launch overheads ?
3. Any Other suggestions..
Can you please help ?
Regards
Paul
Cancel
Up 0 Down

Cancel

Reply

0 Paul over 8 years ago in reply to Anthony Barbier

Hi Anthony,
Thanks for your response.
While working on an OpenCL acceleration project for a Video decoder on Firefly RK3288 Development board(Cortex A-17 / MALI T-764) and Android, we discovered that OpenCL's kernel launching API clEnqueueNDRangeKernel has got huge overheads. Our test kernel contains an indexed integer addition of two buffers (512 bytes) with results stored in a third buffer. The same kernel was launched twice (repeated calls to measure if the second call had some advantages).
OpenCL ARM profiling:
                                Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)
First Launch:         30,12,55,724            3,65,38,31,696             30,12,55,724                          3,65,39,77,696
Second Launch:    30,12,55,724            3,65,59,68,696             30,12,55,724                          3,65,60,81,696
The board uses ARM OpenCL 1.1 Drivers. From the above figures, what seems surprising or erratic is 'Queued Time' and 'Execution start time' returned the same value.
Also there is a big time gap between 'Queued Time' and 'Submit Time to Device' (huge overhead).
While the same kernel launched by same OpenCL host code on Intel XEON desktop gave the following results.....
OpenCL Intel XEON Profiling:
                            Queued Time(ns)      Submit Time(ns)   Execution Start Time(ns)     Execution End Time(ns)
First Launch:           3,80,34,22,080           3,80,34,38,720           3,80,36,53,504                       3,80,36,66,432
       Second Launch:      3,80,46,77,760           3,80,47,02,336           3,80,47,23,936                       3,80,47,27,936
In this case, we can see all the timings seems to be normal.
Our question:
1. What is that we are missing ? Is it that our OpenCL Drivers on the board are old ? Or in general, such values are normal with MALI OpenCL.
2. There is a new feature in OpenCL 2.x - 'Device side Kernel En-queue (nested parallelism)'. Could this be a workaround to avoid the Host Kernel Launch overheads ?
3. Any Other suggestions..
Can you please help ?
Regards
Paul
Cancel
Up 0 Down

Cancel

Children

0 Anthony Barbier over 8 years ago in reply to Paul

Hi Paul,
I'm sorry, I don't understand what units the numbers in your tables are in ? (Why are they packed by groups of 2 digits ?)
Also, I assume the times for each column come from clGetEventProfilingInfo and therefore correspond to CL_PROFILING_COMMAND_QUEUED, CL_PROFILING_COMMAND_SUBMIT, CL_PROFILING_COMMAND_START, CL_PROFILING_COMMAND_END.
If that's correct then command_start and command_end are the timestamps for when the job was executed on the GPU, queues is when clEnqueue was called and submit is when the queue was flushed by the application.
None of those allow you to draw any conclusion about the time it takes to enqueue a kernel.
Regards,
Anthony
Cancel
Up 0 Down

Cancel