This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Initial Look at OpenCL Accelerated SQLite Performance numbers on Mali

Here's a link to a blog post from today about my work on accelerating SQLite with OpenCL on the ARM based Samsung Chromebook with a Mali T604.

Details & Early Benchmarks of OpenCL accelerated SQLite on ARM Mali | Tom Gall

Comments, questions and suggestions most welcome.

Parents
  • Hi Tom,

    Thanks for the link... it is a fascinating use of GPU Compute and the results are quite encouraging.  I'm presuming your CPU version is running on a single core - is that correct?  The Chromebook has dual Cortex-A15 so you could presumably double the performance there.  And also, adding NEON acceleration on the CPU side would be an interesting comparison with the GPU.

    Regarding RenderScript, do let us know how that goes.  It will be interesting to see how it compares to your OpenCL version.

    And seeing how the same code performs on a Mali-T628 platform will also be interesting.  Bear in mind that once an implementation of T628 goes above 4 GPU cores, they are split into 2 core groups... and these appear as separate devices, so the same OpenCL application won't automatically spread the load across both.  I would suspect - though it would be interesting to check - that you would see similar performance with the Mali-T604 you are currently using.

    As Pete has said, there may be a number of ways to optimise what you have done further, and tuning memory access and vector operations is likely the key.

    Regards,

    Tim

Reply
  • Hi Tom,

    Thanks for the link... it is a fascinating use of GPU Compute and the results are quite encouraging.  I'm presuming your CPU version is running on a single core - is that correct?  The Chromebook has dual Cortex-A15 so you could presumably double the performance there.  And also, adding NEON acceleration on the CPU side would be an interesting comparison with the GPU.

    Regarding RenderScript, do let us know how that goes.  It will be interesting to see how it compares to your OpenCL version.

    And seeing how the same code performs on a Mali-T628 platform will also be interesting.  Bear in mind that once an implementation of T628 goes above 4 GPU cores, they are split into 2 core groups... and these appear as separate devices, so the same OpenCL application won't automatically spread the load across both.  I would suspect - though it would be interesting to check - that you would see similar performance with the Mali-T604 you are currently using.

    As Pete has said, there may be a number of ways to optimise what you have done further, and tuning memory access and vector operations is likely the key.

    Regards,

    Tim

Children
  • Hi Tim,

    My system is the ARM based dual core Cortex-A15 Samsung Chromebook.

    You're right that across multi cores as well as with NEON acceleration is also a worthwhile comparison. It'll come down to a matter of how much time I have to devote to it.

    On Renderscript yes this is an interest data point that I want to follow up on. I've just an original Nexus 7 right now tho and I'm not sure that's a good choice. I do have an Arndale board also and will have to see if there's a version of KitKat with accelerated Renderscript drivers. KitKat includes the C apis for Renderscript and obviously that's critical.

    Thanks for the details on the T628 that's also good detail to be aware of. IIRC there's an MP6 and an MP8 which would be a 6 and 8 core Mali? Does that mean there would be 3 cores in 2 groups and 4 cores in 2 groups respectively?  Do the groups get reported as 2 platforms from an OpenCL perspective?

  • Hi Tom,

    The actual configuration of a T628 can vary.  MP6 and MP8 do indeed refer to the number of cores, but an MP6 doesn't necessarily mean 3+3... the T628-MP6 configurations out there at the moment are 4+2.  The CL driver will by default run on the 4-core group... and though it is the intention for both groups to appear as separate devices I'm not sure the current driver supports it - but I'll check that for you.  A T628-MP8 would indeed be configured 4+4.

    HTH, Tim