Summary
Is OpenCL support for the Mali-T628 (for example as found in the Exynos 5420 SoC on the Arndale Octa board) available? If so, how to set it up?
More details
According to the vendor, OpenCL should be supported, but the Arndale Octa Wiki does not state how this can be achieved.
I am using the latest Linaro developer build and installed Mali drivers that contain OpenCL libraries for Mali T604. According to this guide, the driver actually contains references to the Mali T628. So I tried to create the udev rule as specified, which is supposed to solve a permission problem with /dev/mali0, but I found that there is no /dev/mali0 on my installation at all. So my conclusion is that the driver indeed does not support T628.
When I execute a clinfo utility, clGetDeviceInfo returns CL_OUT_OF_HOST_MEMORY for some device properties. Why can I query the GPU for some characteristics, but does this fail for some others? When running a normal application, the same error appears when trying to create an OpenCL Context.
I was surprised to find this topic, where yoshi seems to have OpenCL working and can run benchmarks on his Arndale Octa board. How is this possible if there is no driver available? Or am I just missing something? I hope that you can help me to also establish a working OpenCL development environment.
Yes it seems so. I tried again using the command that you specified.
Building the kernel image works just fine, but errors occur when building the modules:
make-j8 modules CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h make[1]: `include/generated/mach-types.h' is up to date. CALL scripts/checksyscalls.sh Building modules, stage 2. MODPOST 9 modules ERROR: "tcp_nuke_addr" [net/ipv6/ipv6.ko] undefined!
make-j8 modules
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
CALL scripts/checksyscalls.sh
Building modules, stage 2.
MODPOST 9 modules
ERROR: "tcp_nuke_addr" [net/ipv6/ipv6.ko] undefined!
I disabled ipv6 altogether using "make xconfig" and compiled everything again. This time all is well, until I reboot. Sometimes it gives me a kernel panic:
[ 6.195000] [<c005211f>] (cpu_startup_entry) from [<20008525>] (0x20008525) [ 6.195000] CPU2: stopping [ 6.195000] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.14.7-0-linaro-arndale-octa #26 [ 6.195000] [<c00127a5>] (unwind_backtrace) from [<c000fba9>] (show_stack+0x11/0x14) [ 6.195000] [<c000fba9>] (show_stack) from [<c0435655>] (dump_stack+0x61/0x6c) [ 6.195000] [<c0435655>] (dump_stack) from [<c0011903>] (handle_IPI+0x16b/0x184) [ 6.195000] [<c0011903>] (handle_IPI) from [<c000848f>] (gic_handle_irq+0x57/0x58) [ 6.195000] [<c000848f>] (gic_handle_irq) from [<c001045b>] (__irq_svc+0x3b/0x5c) [ 6.195000] Exception stack(0xe611bf50 to 0xe611bf98) [ 6.195000] bf40: e611bf98 00000006 71431ea3 00000001 [ 6.195000] bf60: 61755fdc 00000001 e675c368 c06f670c 00000000 00000002 c06f66c0 c06f04e0 [ 6.195000] bf80: 00000000 e611bf98 29aaaaab c035e052 60000133 ffffffff [ 6.195000] [<c001045b>] (__irq_svc) from [<c035e052>] (cpuidle_enter_state+0x3a/0xa8) [ 6.195000] [<c035e052>] (cpuidle_enter_state) from [<c035e145>] (cpuidle_idle_call+0x85/0x13c) [ 6.195000] [<c035e145>] (cpuidle_idle_call) from [<c000d7b1>] (arch_cpu_idle+0xd/0x2c) [ 6.195000] [<c000d7b1>] (arch_cpu_idle) from [<c005211f>] (cpu_startup_entry+0x10f/0x148) [ 6.195000] [<c005211f>] (cpu_startup_entry) from [<20008525>] (0x20008525) [ 6.195000] drm_kms_helper: panic occurred, switching back to text console
[ 6.195000] [<c005211f>] (cpu_startup_entry) from [<20008525>] (0x20008525)
[ 6.195000] CPU2: stopping
[ 6.195000] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.14.7-0-linaro-arndale-octa #26
[ 6.195000] [<c00127a5>] (unwind_backtrace) from [<c000fba9>] (show_stack+0x11/0x14)
[ 6.195000] [<c000fba9>] (show_stack) from [<c0435655>] (dump_stack+0x61/0x6c)
[ 6.195000] [<c0435655>] (dump_stack) from [<c0011903>] (handle_IPI+0x16b/0x184)
[ 6.195000] [<c0011903>] (handle_IPI) from [<c000848f>] (gic_handle_irq+0x57/0x58)
[ 6.195000] [<c000848f>] (gic_handle_irq) from [<c001045b>] (__irq_svc+0x3b/0x5c)
[ 6.195000] Exception stack(0xe611bf50 to 0xe611bf98)
[ 6.195000] bf40: e611bf98 00000006 71431ea3 00000001
[ 6.195000] bf60: 61755fdc 00000001 e675c368 c06f670c 00000000 00000002 c06f66c0 c06f04e0
[ 6.195000] bf80: 00000000 e611bf98 29aaaaab c035e052 60000133 ffffffff
[ 6.195000] [<c001045b>] (__irq_svc) from [<c035e052>] (cpuidle_enter_state+0x3a/0xa8)
[ 6.195000] [<c035e052>] (cpuidle_enter_state) from [<c035e145>] (cpuidle_idle_call+0x85/0x13c)
[ 6.195000] [<c035e145>] (cpuidle_idle_call) from [<c000d7b1>] (arch_cpu_idle+0xd/0x2c)
[ 6.195000] [<c000d7b1>] (arch_cpu_idle) from [<c005211f>] (cpu_startup_entry+0x10f/0x148)
[ 6.195000] drm_kms_helper: panic occurred, switching back to text console
And sometimes it fails on mounting the root file system:
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. [ 8.360000] usb 5-1.4: new high-speed USB device number 3 using exynos-ehci [ 8.460000] usb 5-1.4: New USB device found, idVendor=0b95, idProduct=772a [ 8.470000] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 8.480000] usb 5-1.4: Product: AX88772 [ 8.485000] usb 5-1.4: Manufacturer: ASIX Elec. Corp. [ 8.490000] usb 5-1.4: SerialNumber: 000001 [ 8.810000] asix 5-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, 02:0d:ce:dd:fc:24 chvt: can't open console Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) chvt: can't open console ALERT! /dev/disk/by-uuid/126fc49e-6766-4491-93c4-4580bb0fab91 does not exist. Dropping to a shell! Couldn't get a file descriptor referring to the console
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
[ 8.360000] usb 5-1.4: new high-speed USB device number 3 using exynos-ehci
[ 8.460000] usb 5-1.4: New USB device found, idVendor=0b95, idProduct=772a
[ 8.470000] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 8.480000] usb 5-1.4: Product: AX88772
[ 8.485000] usb 5-1.4: Manufacturer: ASIX Elec. Corp.
[ 8.490000] usb 5-1.4: SerialNumber: 000001
[ 8.810000] asix 5-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, 02:0d:ce:dd:fc:24
chvt: can't open console
Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
- Check rootdelay= (did the system wait long enough?)
- Check root= (did the system wait for the right device?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/126fc49e-6766-4491-93c4-4580bb0fab91 does not exist. Dropping to a shell!
Couldn't get a file descriptor referring to the console
I found that the second error might be caused by the CONFIG_DEVTMPFS option being not set. I tried again using CONFIG_DEVTMPFS=y but the errors persist.
Hi bramv
We've been able to replicate the issue you've described and are working on it here. We'll get back to you as soon as we have something to share.
Thanks,
Rich
Hi bramv,
After some more investigation, we've found that you need to use even more configuration fragments:
./scripts/kconfig/merge_config.sh linaro/configs/linaro-base.conf linaro/configs/distribution.conf linaro/configs/arndale_octa.conf linaro/configs/lt-arndale_octa.conf linaro/configs/mali-arndale-octa.conf
Then if you use this build command:
make zreladdr-y=0x20008000 uImage modules dtbs -j8
it should create a arch/arm/boot/uImage file which you can replace on the 14.08 Linaro binary image, on the boot partition.
If you see build errors related to Gator, then a quick fix is to disable CONFIG_GATOR in .config (make menuconfig...). I hope this helps. Please note that this kernel contains the r4p0-02rel0 kernel-side Mali driver, so only the user-side driver with the exact same version should be used. This is not compatible with r4p1-00rel0 unless the kernel-side driver is upgraded.
Best wishes,
Guillaume
I tried again, from a fresh Linaro 14.08 binary image, the Linaro kernel using the commit that you specified and used the configuration generated by the comment you provided. Using this kernel, I indeed get a /dev/mali0 device. According to my OpenCL benchmark however, performance is still very low (comparable to before).
You mention that the kernel includes the r4p1-00rel0 driver, while only the r4p0-02rel0 and r4p1-00rel0 user space driver are available. As you already mentioned, this is isn't likely to work. Furthermore, I noticed the following:
[ 5.965000] [drm] Initialized drm 1.1.0 20060810 [ 5.970000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully [ 5.980000] exynos-mixer 14450000.mixer: probe start [ 5.985000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 5.990000] [drm] No driver support for vblank timestamp query. [ 5.995000] exynos-sysmmu 14650000.sysmmu: Enabled [ 5.995000] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x45 b40000 [ 6.100000] Console: switching to colour frame buffer device 160x64 [ 6.120000] exynos-drm exynos-drm: fb0: frame buffer device [ 6.125000] exynos-drm exynos-drm: registered panic notifier [ 6.130000] [drm] Initialized exynos 1.0.0 20110530 on minor 0 [ 6.135000] v4 support [ 6.140000] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0 [ 6.145000] mali 11800000.mali: Probed as mali0
[ 5.965000] [drm] Initialized drm 1.1.0 20060810
[ 5.970000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully
[ 5.980000] exynos-mixer 14450000.mixer: probe start
[ 5.985000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 5.990000] [drm] No driver support for vblank timestamp query.
[ 5.995000] exynos-sysmmu 14650000.sysmmu: Enabled
[ 5.995000] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x45
b40000
[ 6.100000] Console: switching to colour frame buffer device 160x64
[ 6.120000] exynos-drm exynos-drm: fb0: frame buffer device
[ 6.125000] exynos-drm exynos-drm: registered panic notifier
[ 6.130000] [drm] Initialized exynos 1.0.0 20110530 on minor 0
[ 6.135000] v4 support
[ 6.140000] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0
[ 6.145000] mali 11800000.mali: Probed as mali0
Why does the kernel identify the GPU as r0p1? Since mali_kbase is built into the kernel I don't know of a way to find out if it is actually r4p1 that is being used.
I tried again, by downloading the latest (r4p1-00rel0) kernel driver and recompiled the kernel. Off-course, I now also used the corresponding r4p1-00rel0 userspace driver. The result however is exactly the same. Both this strange kernel message appears and OpenCL doesn't perform well.
bramv wrote: Why does the kernel identify the GPU as r0p1?
bramv wrote:
Why does the kernel identify the GPU as r0p1?
This is the GPU hardware revision, and is unrelated to the revision of the kernel or userspace driver.
Hth,
Chris
The kernel comes with r4p0 integrated, not r4p1, as is identified in the tag of the release in the Linaro repository and so the r4p0 userspace binaries from malideveloper are the ones to use with the kernel as it comes.
If you want to use r4p1, that is possible but it make take some work to iron out the integration, it may not as simple as unpacking the kernel source into the right place in the kernel tree.
As for your low OpenCL Benchmark results, which benchmark are you using?
Hope this Helps,
You are right, I occidentally mixed up the two version numbers. So I should use the unmodified kernel in combination with the r4p0 userspace binary. This binary comes in two forms, X11 and fbdev. I tried both but don't see much of a difference. Which one should I use?
I am using the same clpeak benchmark that I used:
Platform: ARM Platform Device: Mali-T628 Driver version : 1.1 Compute units : 4 Clock frequency : 533 MHz Single-precision compute (GFLOPS) float : 1.56654 float2 : 3.92411 float4 : 3.92181 float8 : 4.84877 float16 : 4.82142 Device: Mali-T628 Driver version : 1.1 Compute units : 2 Clock frequency : 533 MHz Single-precision compute (GFLOPS) float : 0.747759 float2 : 1.96387 float4 : 1.98183 float8 : 2.43132 float16 : 2.41775
Platform: ARM Platform
Device: Mali-T628
Driver version : 1.1
Compute units : 4
Clock frequency : 533 MHz
Single-precision compute (GFLOPS)
float : 1.56654
float2 : 3.92411
float4 : 3.92181
float8 : 4.84877
float16 : 4.82142
Compute units : 2
float : 0.747759
float2 : 1.96387
float4 : 1.98183
float8 : 2.43132
float16 : 2.41775
It is nice to see that the two clusters are now properly recognized, but the results are nowhere near the +/- 33 Gflops that Chris reported:
I am seeing 33.27 and 33.17 SP GFLOPS for float2 and float4 respectively, or 45% of max theoretical peak
It's probably worth someone at our end testing this out as well, but as a quick sanity check can you ensure that the CPU and GPU DVFS are disabled/otherwise pinned (set CPU DVFS to performance and frequency to something high like 1.7GHz) before running the benchmark? These SoCs have a tendency to take the busfreq down with the CPUfreq when the CPU is idle, and as these GPU benchmarks tend not to stress the CPU too much, this has the effect that the bus clocks down to v/fmin and severely throttles the GPU's memory bandwidth.
Hi Chris,
I guess I would need a utility like cpufreq to set DVFS or change the clock frequency of the SoC, but don't know exactly how to do this. Instead I looked in the kernel configuration and disabled DVFS altogether and only enabled the performance governor. This however does not seem to have any impact on performance.
I agree that it would be useful i someone at our end could run the same benchmark. You can find my code here.
The X11 and fbdev choice gives you the choice to render GLES 3d content either directly into the framebuffer (fbdev), or within the X windowing system (X11). If you are doing nothing graphics related and have no need to install X11, then I would suggest you just use the fbdev version of the driver userspace; in terms of OpenCL they will be exactly the same.
Following on from what Chris said, it could well be that DVFS is clocking down bandwidth and therefore causing the reduced performance you are seeing. I can take a look at this at my end and try and verify if this is the case.
To test at your end quickly, you should be able to disable dvfs as follows:
echo off > /sys/class/misc/mali0/device/dvfs
Hi Rich,
Good to know the difference between X11 and fbdev. I have one follow-up question on this. So far I have linked my OpenCL programs to libmali.so from the userspace binary package instead of using the libOpenCL.so, since libOpenCL.so gives me lots of undefined reference errors whilst using libmali.so, compilation progresses without any errors. I also noticed that libmali.so is the only substantial file in the package:
# ll ../fbdev/ total 21056 drwxr-xr-x 2 root root 4096 Jan 1 2000 ./ drwx------ 11 root root 4096 Jan 1 2000 ../ -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libEGL.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv1_CM.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv2.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libOpenCL.so* -rwxr-x--- 1 16580 16580 21518354 Jul 23 14:44 libmali.so*
# ll ../fbdev/
total 21056
drwxr-xr-x 2 root root 4096 Jan 1 2000 ./
drwx------ 11 root root 4096 Jan 1 2000 ../
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libEGL.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv1_CM.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv2.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libOpenCL.so*
-rwxr-x--- 1 16580 16580 21518354 Jul 23 14:44 libmali.so*
Is using libmali.so to lik against the correct way to go?
Regarding DVFS, the /device/dvfs file is not present on my system:
# tree /sys/class/misc/mali0 /sys/class/misc/mali0 ├── dev ├── device -> ../../../11800000.mali ├── power │ ├── autosuspend_delay_ms │ ├── control │ ├── runtime_active_time │ ├── runtime_status │ └── runtime_suspended_time ├── subsystem -> ../../../../class/misc └── uevent 3 directories, 7 files
# tree /sys/class/misc/mali0
/sys/class/misc/mali0
├── dev
├── device -> ../../../11800000.mali
├── power
│ ├── autosuspend_delay_ms
│ ├── control
│ ├── runtime_active_time
│ ├── runtime_status
│ └── runtime_suspended_time
├── subsystem -> ../../../../class/misc
└── uevent
3 directories, 7 files
And your command results in a permission denied error, even though I am using a root shell:
# echo off > /sys/class/misc/mali0/device/dvfs -bash: /sys/class/misc/mali0/device/dvfs: Permission denied
# echo off > /sys/class/misc/mali0/device/dvfs
-bash: /sys/class/misc/mali0/device/dvfs: Permission denied
I am looking forward to seeing your clpeak results!
"libOpenCL.so" is the spec defined library name that OpenCL should be exposed as on the platform. In our case we implement this (and the GLES libs) as shims which pass through to the libmali.so binary, which is why that's the largest one there. For development purposes you can link against either, but obviously you wouldn't want to do this for a release, you'd want to link against libOpenCL.so for portability.
That said, it SHOULD work, so if you are having issues linking against libOpenCL.so feel free to share them here and we will take a look. I can't think of a good reason why one should work and the other fail.
All the errors are undefined reference errors at link time, the symbols should be there so offhand not sure why that's happening, but in any case linking against libmali.so is working
In your above output, ldd a.out is reporting /usr/lib/libmali.so on the first line, so thats working as expected.
I put all files from the binary userspace driver in the /usr/lib/ directory and compiled just using: "g++ program.cpp -lOpenCL". This gives the same result as compiling like this: "g++ program.cpp /usr/lib/libOpenCL.so". The output is too long to show here.
Compiling using "g++ clpeak-arndale-octa.cpp /usr/lib/libmali.so" works just fine: But now that I am further looking into the compilation, I noticed the following:
# ldd a.out /usr/lib/libmali.so (0xb5e03000) libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb5d4a000) libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb5cde000) libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb5cbd000) libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb5bd6000) /lib/ld-linux-armhf.so.3 (0xb6f6d000) librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb5bc8000) libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb5bac000) libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb5ba1000)
# ldd a.out
/usr/lib/libmali.so (0xb5e03000)
libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb5d4a000)
libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb5cde000)
libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb5cbd000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb5bd6000)
/lib/ld-linux-armhf.so.3 (0xb6f6d000)
librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb5bc8000)
libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb5bac000)
libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb5ba1000)
The resulting binary does not contain any reference to a OpenCL or Mali library. Does this mean that the program runs on the CPU instead of GPU? But then, why does it report the two Mali-T628 devices? I am confused.
Whoops, I missed that line. In that case I can conclude the following:
I am running a kernel containing the proper Mali kernel driver, my program is linked to the corresponding userspace binary and T628-MP6 is correctly recognized.
Is it indeed DVFS that is preventing the benchmark to achieve the expected performance level? Why is /sys/class/misc/mali0/device/dvfs missing?
Hi Bramv,
Taken shamelessly from an answer by peterharris in another thread:
"The DVFS code for the GPU is not directly managed by our drivers - it is part of the platform integration provided in the BSP from Insignal. This style of integration occurs because the DVFS analogue parts which control F and V for the power domains are not part of the ARM IP. This question is probably best asked to Samsung or Insignal, as they maintain the BSP for that platform."
It is possible to disable features such as DVFS by recompiling the linux kernel and mali kernel module with the correct configuration. The reason this reduced performance (normally) happens is because DVFS ties the GPU frequencies to the workload of the CPU. As you are running your intensive GPU test, the CPU is left to idle and so DVFS drops the CPU core speed, unfortunately, also dropping the GPU frequency.
A means of stopping this happening would be to add some CPU intensive code to run whilst the GPU code is running to stop DVFS dropping the frequencies.