Summary
Is OpenCL support for the Mali-T628 (for example as found in the Exynos 5420 SoC on the Arndale Octa board) available? If so, how to set it up?
More details
According to the vendor, OpenCL should be supported, but the Arndale Octa Wiki does not state how this can be achieved.
I am using the latest Linaro developer build and installed Mali drivers that contain OpenCL libraries for Mali T604. According to this guide, the driver actually contains references to the Mali T628. So I tried to create the udev rule as specified, which is supposed to solve a permission problem with /dev/mali0, but I found that there is no /dev/mali0 on my installation at all. So my conclusion is that the driver indeed does not support T628.
When I execute a clinfo utility, clGetDeviceInfo returns CL_OUT_OF_HOST_MEMORY for some device properties. Why can I query the GPU for some characteristics, but does this fail for some others? When running a normal application, the same error appears when trying to create an OpenCL Context.
I was surprised to find this topic, where yoshi seems to have OpenCL working and can run benchmarks on his Arndale Octa board. How is this possible if there is no driver available? Or am I just missing something? I hope that you can help me to also establish a working OpenCL development environment.
Hi bramv,
The Mali-T628 GPU supports OpenCL, absolutely, and we support this in the drivers for that GPU which we ship to our silicon customers when they licence the GPU design. The userspace drivers that you have downloaded from malideveloper.arm.com require a kernel driver integration in order to work.
If /dev/mali0 does not exist on your platform then the mali kernel driver that exposes this device has not been integrated with the kernel you are using. I believe Linaro kernels do not currently support Mali so this is expected to not be available on those kernels. You either need to integrate the kernel module with the Linaro kernel, lobby Linaro to support Mali in their kernel (would be a long term solution, don't know why they don't do this currently), or use the Insignal kernel which should support it. Yoshi is almost certainly not using a Linaro kernel, and is using one probably from Insignal which has the mali kernel module integrated.
Hope this helps,
Chris
There is indeed no /dev/mali0 on my platform. I tried compiling and installing this kernel driver to fix it, but this hasn't been successful so far. The same holds for the Insignal kernel, I didn't manage to replace the Linaro kernel with that one. The main problem seems to be that I cannot get uboot to actually boot the freshly compiled kernel.
I have tried the boot files and can confirm that OpenGL is indeed working. OpenCL however doesn't and results in errors:
[ 190.070000] Mali<ERROR, BASE_MMU>: kbase_mmu_report_fault_and_kill Unhandled Page fault in AS0 at VA 0x00000000B6ED2100
[ 190.070000] raw fault status 0x820003C3
[ 190.070000] decoded fault status: SLAVE FAULT
[ 190.070000] exception type 0xC3: TRANSLATION_FAULT
[ 190.070000] access type 0x3: WRITE
[ 190.070000] source id 0x8200
[ 190.090000] Mali<ERROR, BASE_JM>: kbase_job_done_slot t6xx: GPU fault 0x43 from job slot 1
Can I conclude that it is just a matter of waiting for the r4p0 userspace binaries to become available?
I don't think that's userspace related, it's more likely to be a problem with the kernel integration. Interesting that it only happens with CL. Can you reproduce that with one of the simple samples from our CL SDK? https://developer.arm.com/products/software/mali-sdks Thanks,
Hi Chris,
We also want to use T628 for opencl computations on Exynos5420, But our SOC has Android as OS. Can you please help us to find the opencl driver for android? OR we have to get the source and compile?
PS: If you feel my post will divert the original post's intention I can raise new post.
Thanks,
Veeranna
Hi Veeranna,
The driver is composed of 2 parts, kernel, and userspace binaries. The kernel space source code is open source and available from the link you posted, and that will need integrating into the kernel for your SoC, which is more than likely already done by your specific SoC/board/device vendor. What board/device are you using? If you're using a kernel other than the ones provided by the device/SoC/board vendor, then you will likely need to do this yourself.
Once that's done, you will need the matching userspace binary. We currently provide r3p0-02rel0 binaries for Linux (provided as part of the Chromebook guide), and an r4p0 release is coming very soon (currently stalled in legal review). We do not however currently offer any Android userspace binaries for any devices/boards, nor do we currently have any plans to, so these would need to come from your SoC/device/board vendor.
Thanks for the reply Chris,
We are using Exynos 5420 board from Insignal. Yes it has kernel level binary inside the device. Getting a driver at Android usespace will be very helpful, because we run our application on different vendor SOCs to compare the performance. The other SOCs we have has Android as the OS.
I will check in Insignal forums.
Thanks for your help.
I got the driver from Samsung/Insignal, and able to build the Andorid Image and Run. But we are observing random behavior in OpenCL APIs. Some time clcreateKernel fails and if re-run the app clcreateKernel returns success but clcreatebuffer fails with error code as zero and returned memory pointer as NULL.
Can you give some to hint where to look.
Can you provide us with a simple reproducer? Also please let me know the driver version you're using, with adb pull /vendor/lib/egl/libGLES_mali.so && strings libGLES_mali.so | grep r[0-9]p[0-9] assuming it's in vendor and not system.
With above said command, I got version as 1.4 Midgard-"r3p0-01bet0. I will try to give simple app to run.
Thank you for help.
Interestingly simple median filter example runs fine(openCL on T628), but our application fails. Our application has many kernels and huge memory bandwidth will it be the reason?
Any suggestion for debugging will be helpful.
Hi veerannah,
I doubt simply having a lot of kernels would in itself be a problem, nor would I expect memory bandwidth to be an issue. Can you confirm whether you are creating all of your kernels up-front before any clEnqueueNDRange commands take place, or do you somewhat interleave kernel compilation and execution? Can you also confirm whether all program objects compile without error prior to the calls to clCreateKernel that consume them?
We create all our kernels first, then we allocate required buffers and then call clEnqueueNDRange. Sometime it fails to create some kernels(some are got created). And sometimes it fails in buffer allocation. It didnt hit clEnqueueNDRange yet.
It's interesting that its not deterministic where it fails. Without a reproducer I'm afraid I'm just guessing at possible causes. Also the driver itself is quite old now, we're currently at r4p0, so it is worth asking them when they plan to provide an up to date version as this could fix it.
I tried the SGEMM sample from the SDK:
root@arndale-octa:~/Mali_OpenCL_SDK_v1.1.0/samples/sgemm# ./sgemm [PLUGIN INFO] Plugin initializing [PLUGIN DEBUG] './override.instr_config' not found, trying to open the process config file [PLUGIN DEBUG] './sgemm.instr_config' not found, trying to open the default config file [PLUGIN ERROR] Couldn't open default config file './default.instr_config'. [PLUGIN INFO] No configuration file found, attempting to use environment [PLUGIN INFO] CINSTR GENERAL: Output directory set to: . [PLUGIN INFO] No instrumentation features requested. ^C[ 384.380000] Mali<ERROR, BASE_MMU>: kbase_mmu_report_fault_and_kill Unhandled Page fault in AS0 at VA 0x00000000B6F0E100 [ 384.380000] raw fault status 0x820003C3 [ 384.380000] decoded fault status: SLAVE FAULT [ 384.380000] exception type 0xC3: TRANSLATION_FAULT [ 384.380000] access type 0x3: WRITE [ 384.380000] source id 0x8200 [ 384.400000] Mali<ERROR, BASE_JM>: kbase_job_done_slot t6xx: GPU fault 0x43 from job slot 1
root@arndale-octa:~/Mali_OpenCL_SDK_v1.1.0/samples/sgemm# ./sgemm
[PLUGIN INFO] Plugin initializing
[PLUGIN DEBUG] './override.instr_config' not found, trying to open the process config file
[PLUGIN DEBUG] './sgemm.instr_config' not found, trying to open the default config file
[PLUGIN ERROR] Couldn't open default config file './default.instr_config'.
[PLUGIN INFO] No configuration file found, attempting to use environment
[PLUGIN INFO] CINSTR GENERAL: Output directory set to: .
[PLUGIN INFO] No instrumentation features requested.
^C[ 384.380000] Mali<ERROR, BASE_MMU>: kbase_mmu_report_fault_and_kill Unhandled Page fault in AS0 at VA 0x00000000B6F0E100
[ 384.380000] raw fault status 0x820003C3
[ 384.380000] decoded fault status: SLAVE FAULT
[ 384.380000] exception type 0xC3: TRANSLATION_FAULT
[ 384.380000] access type 0x3: WRITE
[ 384.380000] source id 0x8200
[ 384.400000] Mali<ERROR, BASE_JM>: kbase_job_done_slot t6xx: GPU fault 0x43 from job slot 1
The program hangs after the [PLUGIN INFO] messages are printed. When I press ctrl+c, the kernel messages appear, indicating that the GPU run into trouble?
That looks like an integration problem to me, it's worth reporting to Linaro, as I believe that's the kernel you're using?
Finally we are able to run our application on T628. But performance numbers are not good. Do we get any improvement if we move r4p0 driver?
Any other suggestions to improve the GPU performance will be helpful.
There are a number of performance improvements present in the r4p0 driver not present in previous releases. Keep in mind that OpenCL is not performance portable, so an application optimized for another platform, or otherwise written with another architecture in mind, may not be performant when run on another platform/architecture. The below materials contain advice and detail some of the differences and considerations when moving from desktop to Mali, so let us know if they helps or if you have any further queries and we'll be happy to help.
There is the Developer Guide: Mali-T600 Series GPU OpenCL Developer Guide « Mali Developer Center
There is also the OpenCL faq: http://malideveloper.arm.com/downloads/OpenCL_FAQ.pdf
And the Laplace case study by timhar01 Technical presentation about ARM Mali-T600 GPU and ARM Mali-T700 GPU Compute - YouTube (although I recommend you watch the whole video)
Where can we find benchmark numbers for examples kernels provided with SDK?. We ran "sobel" filter example for 1024x1024 image, profile number measured across kernel trigger and wait measures as 5msec. And Laplace transform for 4096x2048 image, number measures as 10msec. Is these numbers expected for for Mali T628 @ 533Mhz GPU? We measured the numbered across clEnqueueNDRange() and immediate clFinish().
Hi Veerannah,
We don't have stock performance numbers for the SDK samples to hand. However what I will do later this week is run Sobel and Laplace on a platform here and get some numbers back to you.
HTH, Tim
Hi Tim,
We really appreciate your quick reply. Looking forward to see the performance numbers.
I ran the Sobel filter as provided by the Mali OpenCL SDK, modifying it to run a 1024x1024 image and I see the following...
Queued time: 0.11ms
Wait time: 0.222292ms
Run time: 0.680683ms
The Mali OpenCL SDK doesn't include a Laplace transform... which one are you using?
Best regards, Tim
I realise I didn't include the platform details. I'm running on a Mali-T604 platform with the GPU clocked @ 533MHz. I'm using a later version of the drivers than you, I believe, and that would likely mean the performance you would expect on your platform is slightly worse. On balance though I would expect the performance on your Mali-T628 system to be roughly equivalent to mine.
Thank you for providing information on performance number. With "cl_event" profiling we are getting numbers as
Wait time: 0.2725ms
Run time: 0.7586ms
But if we measure numbers across clEnqueueNDRangeKernel() and clfinish() using gettimeofday we are seeing number is 1.5msec. Is this API/driver overhead?
Is this API/driver overhead?
Probably, yes. When I try this I'm seeing the same sort of additional time using gettimeofday vs the profiling values output. With the Sobel sample I looked at the difference in this additional time when I increased the size of the image being processed. It appeared to stay roughly the same, which would back up the theory that this is due to additional overhead in the API.
Hope that helps,
Tim
Thank you for the information.
View all questions in Graphics and Gaming forum