This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Initial Look at OpenCL Accelerated SQLite Performance numbers on Mali

Here's a link to a blog post from today about my work on accelerating SQLite with OpenCL on the ARM based Samsung Chromebook with a Mali T604.

Details & Early Benchmarks of OpenCL accelerated SQLite on ARM Mali | Tom Gall

Comments, questions and suggestions most welcome.

Parents
  • Hi Tom,

    Thanks for the link... it is a fascinating use of GPU Compute and the results are quite encouraging.  I'm presuming your CPU version is running on a single core - is that correct?  The Chromebook has dual Cortex-A15 so you could presumably double the performance there.  And also, adding NEON acceleration on the CPU side would be an interesting comparison with the GPU.

    Regarding RenderScript, do let us know how that goes.  It will be interesting to see how it compares to your OpenCL version.

    And seeing how the same code performs on a Mali-T628 platform will also be interesting.  Bear in mind that once an implementation of T628 goes above 4 GPU cores, they are split into 2 core groups... and these appear as separate devices, so the same OpenCL application won't automatically spread the load across both.  I would suspect - though it would be interesting to check - that you would see similar performance with the Mali-T604 you are currently using.

    As Pete has said, there may be a number of ways to optimise what you have done further, and tuning memory access and vector operations is likely the key.

    Regards,

    Tim

Reply
  • Hi Tom,

    Thanks for the link... it is a fascinating use of GPU Compute and the results are quite encouraging.  I'm presuming your CPU version is running on a single core - is that correct?  The Chromebook has dual Cortex-A15 so you could presumably double the performance there.  And also, adding NEON acceleration on the CPU side would be an interesting comparison with the GPU.

    Regarding RenderScript, do let us know how that goes.  It will be interesting to see how it compares to your OpenCL version.

    And seeing how the same code performs on a Mali-T628 platform will also be interesting.  Bear in mind that once an implementation of T628 goes above 4 GPU cores, they are split into 2 core groups... and these appear as separate devices, so the same OpenCL application won't automatically spread the load across both.  I would suspect - though it would be interesting to check - that you would see similar performance with the Mali-T604 you are currently using.

    As Pete has said, there may be a number of ways to optimise what you have done further, and tuning memory access and vector operations is likely the key.

    Regards,

    Tim

Children