Summary
Is OpenCL support for the Mali-T628 (for example as found in the Exynos 5420 SoC on the Arndale Octa board) available? If so, how to set it up?
More details
According to the vendor, OpenCL should be supported, but the Arndale Octa Wiki does not state how this can be achieved.
I am using the latest Linaro developer build and installed Mali drivers that contain OpenCL libraries for Mali T604. According to this guide, the driver actually contains references to the Mali T628. So I tried to create the udev rule as specified, which is supposed to solve a permission problem with /dev/mali0, but I found that there is no /dev/mali0 on my installation at all. So my conclusion is that the driver indeed does not support T628.
When I execute a clinfo utility, clGetDeviceInfo returns CL_OUT_OF_HOST_MEMORY for some device properties. Why can I query the GPU for some characteristics, but does this fail for some others? When running a normal application, the same error appears when trying to create an OpenCL Context.
I was surprised to find this topic, where yoshi seems to have OpenCL working and can run benchmarks on his Arndale Octa board. How is this possible if there is no driver available? Or am I just missing something? I hope that you can help me to also establish a working OpenCL development environment.
Hi bramv,
The Mali-T628 GPU supports OpenCL, absolutely, and we support this in the drivers for that GPU which we ship to our silicon customers when they licence the GPU design. The userspace drivers that you have downloaded from malideveloper.arm.com require a kernel driver integration in order to work.
If /dev/mali0 does not exist on your platform then the mali kernel driver that exposes this device has not been integrated with the kernel you are using. I believe Linaro kernels do not currently support Mali so this is expected to not be available on those kernels. You either need to integrate the kernel module with the Linaro kernel, lobby Linaro to support Mali in their kernel (would be a long term solution, don't know why they don't do this currently), or use the Insignal kernel which should support it. Yoshi is almost certainly not using a Linaro kernel, and is using one probably from Insignal which has the mali kernel module integrated.
Hope this helps,
Chris
There is indeed no /dev/mali0 on my platform. I tried compiling and installing this kernel driver to fix it, but this hasn't been successful so far. The same holds for the Insignal kernel, I didn't manage to replace the Linaro kernel with that one. The main problem seems to be that I cannot get uboot to actually boot the freshly compiled kernel.
Hi Veeranna,
Is this API/driver overhead?
Probably, yes. When I try this I'm seeing the same sort of additional time using gettimeofday vs the profiling values output. With the Sobel sample I looked at the difference in this additional time when I increased the size of the image being processed. It appeared to stay roughly the same, which would back up the theory that this is due to additional overhead in the API.
Hope that helps,
Tim
Thank you for the information.
I cannot stress enough that if you are interested in real world performance, you should move away from these synthetic benchmarks and look at actual data for proper use cases. There are some real-world oriented benchmarks out there already, which already have Mali powered devices in their results.
I totally agree that synthetic benchmarks are no proper measure for real world performance, but these benchmarks do help to determine whether the environment is set-up properly, which is clearly not (yet) the case.
Have spoken to someone from the driver team, if you're only seeing one device then it must be an old driver, it's worth asking Insignal what their roadmap is for providing updates. On current drivers you will see 2 devices, one with 4 cores and one with 2.
By now the r4p0-02rel0 drivers are available. I tried running the benchmarks using them instead of the r3p0-02rel0 drivers that I used before, however the runtime fails to create an OpenCL context.
Someone opened a topic on the Insignal forum about the latest drivers, it seems that I have to compile my own kernel using the latest kernel driver to get it working.
Hi Bram,
Yes thats correct the old kernel driver will not work with the new userspace driver, the kernel needs recompiling with the new kernel driver.
Hth,
Hi Bramv,
Are you able use all 6 cores T628?. If you are succeeded can you give complete steps to builds required binaries.
Thanks,
Veeranna
No unfortunately not. I tried to run this image to get it up and running but the board wouldn't boot.
I am now running a linaro image using kernel 3.15.0-1-linaro-arndale-octa. But so far I haven't been able to build a mali kernel driver module myself either: Update: see end of post
root@arndale-octa:~/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard# make -C /lib/modules/`uname -r`/build M=$PWD modules make: Entering directory `/usr/src/linux-headers-3.15.0-1-linaro-arndale-octa' Building modules, stage 2. MODPOST 0 modules make: Leaving directory `/usr/src/linux-headers-3.15.0-1-linaro-arndale-octa'
root@arndale-octa:~/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard# make -C /lib/modules/`uname -r`/build M=$PWD modules
make: Entering directory `/usr/src/linux-headers-3.15.0-1-linaro-arndale-octa'
Building modules, stage 2.
MODPOST 0 modules
make: Leaving directory `/usr/src/linux-headers-3.15.0-1-linaro-arndale-octa'
No ko file is being generated.
The driver includes some scons files, but they do not work for me. First of all, a Sconstruct file is not included. When adding one myself and running scons, the environment is not recognized:
root@arndale-octa:~/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm# scons scons: Reading SConscript files ... scons: *** Import of non-existent variable ''env'' File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard/sconscript", line 20, in <module>
root@arndale-octa:~/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm# scons
scons: Reading SConscript files ...
scons: *** Import of non-existent variable ''env''
File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard/sconscript", line 20, in <module>
So I figured, let's fix this by replacing Import('env') by env = Environment(ENV = os.environ) but the dictionary class that scons uses does not handle non-existent keys the way the sconscript expects it:
scons: Reading SConscript files ... KeyError: 'v': File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/Sconstruct", line 1: SConscript('midgard/sconscript') File "/usr/lib/scons/SCons/Script/SConscript.py", line 609: return method(*args, **kw) File "/usr/lib/scons/SCons/Script/SConscript.py", line 546: return _SConscript(self.fs, *files, **subst_kw) File "/usr/lib/scons/SCons/Script/SConscript.py", line 260: exec _file_ in call_stack[-1].globals File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard/sconscript", line 28: if env['v'] != '1': File "/usr/lib/scons/SCons/Environment.py", line 412: return self._dict[key]
KeyError: 'v':
File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/Sconstruct", line 1:
SConscript('midgard/sconscript')
File "/usr/lib/scons/SCons/Script/SConscript.py", line 609:
return method(*args, **kw)
File "/usr/lib/scons/SCons/Script/SConscript.py", line 546:
return _SConscript(self.fs, *files, **subst_kw)
File "/usr/lib/scons/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
File "/root/TX011-SW-99002-r4p1-00rel0/driver/product/kernel/drivers/gpu/arm/midgard/sconscript", line 28:
if env['v'] != '1':
File "/usr/lib/scons/SCons/Environment.py", line 412:
return self._dict[key]
Any help on compiling this kernel module is highly appreciated!
Update:
I managed to compile the kernel module. It turns out that you don't need the scons files at all and can just use the included Kbuild and Makefile. Compiling mali_kbase.ko is a matter of running:
CONFIG_MALI_MIDGARD=m make
But inserting this module into the kernel does not provide a /dev/mali, neither does lshw detects a GPU. This means that we still cannot run OpenCL on the T628. How can we make the /dev/mali device show up?
/dev/mali should show as soon as you insmod a correctly built Mali kernel module. If /dev/mali is not appearing then the kernel module is not correctly built and configured for the platform. Even if you successfully build and insert a correct kernel module for the platform userspace functionality, such as OpenCL, will not be available without Mali userspace binaries matching the version of the Mali kernel module.
Can you confirm the kernel you are trying to compile with/against and the versions of Mali kernel and userspace you have available to you, as ARM have only currently released the r4p0 userspace binaries, though r4p1 should be available through the malideveloper site in the not too distant future.
Could you also confirm the steps you have taken to configure the Mali kernel source for use with the Arndale Octa board, especially as I can see you are trying to build it out of tree? You need to configure the kernel device tree and add platform specific configuration to the kernel to set up interrupts and memory addresses etc. The easiest way to integrate the latest version of the kernel would be to download the Linaro kernel for Arndale Octa and extract the r4p1 kernel source INTO the kernel source, overwriting the included r4p0 kernel source as this will keep the platform integration files intact. Looking at the kernel source you mentioned in your earlier post "II-arndale-octa" I can see that the platform integration files are present, for example in /drivers/gpu/arm/midgard/platform/5420. You could try extracting the r4p1 kernel source into this kernel tree and building that way.
Rich
Hi Rich,
In fact, I didn't configure anything at all but just ran:
And ended up with this module for the r4p1 kernel driver:
filename: /lib/modules/3.15.0-1-linaro-arndale-octa/kernel/drivers/gpu/arm/midgard/mali_kbase.ko version: r4p1-00rel0 license: GPL srcversion: 29BCA495EB0E26992A9C01E alias: of:N*T*Carm,mali-midgard* alias: of:N*T*Carm,malit6xx* depends: vermagic: 3.15.0-1-linaro-arndale-octa SMP mod_unload ARMv7 p2v8
filename: /lib/modules/3.15.0-1-linaro-arndale-octa/kernel/drivers/gpu/arm/midgard/mali_kbase.ko
version: r4p1-00rel0
license: GPL
srcversion: 29BCA495EB0E26992A9C01E
alias: of:N*T*Carm,mali-midgard*
alias: of:N*T*Carm,malit6xx*
depends:
vermagic: 3.15.0-1-linaro-arndale-octa SMP mod_unload ARMv7 p2v8
And this one for r4p0:
Next I downloaded the following kernel: linux-linaro-3.15-2014.06, because it seems to be the closest match to the running kernel: 3.15.0-1-linaro-arndale-octa. I enabled the Mali options in the kernel configuration:
Enable Mali GPU support in Gator -Mali-400MP or Mali-450MP +Mali-T604 or Mali-T658 Path to Mali driver: drivers/gpu/arm/midgard
Enable Mali GPU support in Gator
-Mali-400MP or Mali-450MP
+Mali-T604 or Mali-T658
Path to Mali driver: drivers/gpu/arm/midgard
To do so I had to enable some timers and performance events options as well. The path points to the r4p0 driver that I copied into the kernel tree. Why does the configuration option only mention T604 and T658, and not T628?
After running "make modules" I had once again a mali_kbase.ko, but the result is the same as for the out-of-tree build: no /dev/mail.
The next step was to build and flash the full kernel. Apparently the kernel I used is not fully compatible since the -arndale-octa postfix is missing. As a result I had to modify some filenames to get the new kernel installed. Unfortunately this didn't work, the board does not boot any more. I have been looking for the proper linaro arndale-octa kernel source so that I can try it again, but haven't found it yet since it appears that only the binary hwpack and rootfs are available. Will be continued.
There is a newer Linaro kernel and full Ubuntu binary images with Mali r4p0-02rel0 driver already integrated for the Arndale board. If you want to run some applications using the Mali-T604 GPU and you just need a working Arndale system, you can download a full Ubuntu binary image from the 14.08 Linaro Releases.
The r4p1 user-side binary drivers will soon be released on our public download page mentioned previously, but in the meantime the latest version you can use is r4p0. If you're interested in rebuilding the Linux kernel for other reasons than upgrading the Mali driver, you can get the Linaro source code from Linaro Git Hosting - gwg/linaro-lsk.git/shortlog. The commit used in the binary release is 14c58eb6 and you'll need to generate the kernel configuration file using a script and the following fragments (see Linaro documentation for more details):
linaro/configs/linaro-base.conf linaro/configs/distribution.conf
linaro/configs/arndale_octa.conf linaro/configs/lt-arndale_octa.conf
linaro/configs/mali-arndale-octa.conf
Hope this helps!
Best wishes,
Guillaume
Sorry, I mentioned the Arndale board in my previous comment but you're using an Arndale Octa which has a Mali-T628 GPU. You'll need to use a different version of the binary Linaro Releases, which also contain a different user-side Mali driver binary. The kernel source and configuration I mentioned are correct for Arndale Octa.
Hello Guillaume,
These are very useful replies indeed!
I downloaded the linaro-lsk kernel, checked out the lsk-v3.14-lt-mali-r4p0-beta2 branch and configured it using the linaro-base.conf, distribution.conf and arndale_octa.conf fragments that I got from here, using the scripts/kconfig/merge_config.sh script.
I noticed that there are no referenced to gator or mali_kbase in neither modules.buitlin or modules.order, is that correct?
While booting the board, it hangs on initializing DRM:
[ 5.935000] console [ttySAC3] enabled [ 5.935000] [drm] Initialized drm 1.1.0 20060810 [ 5.940000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully [ 5.950000] exynos-mixer 14450000.mixer: probe start [ 5.955000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 5.960000] [drm] No driver support for vblank timestamp query. [ 5.965000] [drm:exynos_drm_connector_get_modes] *ERROR* Panel operation get_edid failed -19
[ 5.935000] console [ttySAC3] enabled
[ 5.935000] [drm] Initialized drm 1.1.0 20060810
[ 5.940000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully
[ 5.950000] exynos-mixer 14450000.mixer: probe start
[ 5.955000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 5.960000] [drm] No driver support for vblank timestamp query.
[ 5.965000] [drm:exynos_drm_connector_get_modes] *ERROR* Panel operation get_edid failed -19
This is without connecting any display. With display, the error is different:
[ 5.940000] console [ttySAC3] enabled [ 5.945000] [drm] Initialized drm 1.1.0 20060810 [ 5.950000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully [ 5.955000] exynos-mixer 14450000.mixer: probe start [ 5.965000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 5.970000] [drm] No driver support for vblank timestamp query.
[ 5.940000] console [ttySAC3] enabled
[ 5.945000] [drm] Initialized drm 1.1.0 20060810
[ 5.950000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully
[ 5.955000] exynos-mixer 14450000.mixer: probe start
[ 5.965000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 5.970000] [drm] No driver support for vblank timestamp query.
I have seen that before while trying this kernel.
After retrying for a few times the kernel did finally boot, but here is no sign of mali_kbase. While running the new kernel, I configured mali_kbase to be compiled as module and loaded it. It does not seem to work:
[ 424.745000] mali 11800000.mali: Platform data not specified [ 424.750000] mali: probe of 11800000.mali failed with error -2
[ 424.745000] mali 11800000.mali: Platform data not specified
[ 424.750000] mali: probe of 11800000.mali failed with error -2
So I set MALI_EXPERT=y and CONFIG_MALI_PLATFORM_THIRDPARTY_NAME=arndale_octa, but this gives a warning during compilation:
WARNING: "samsung_cpu_id" [/root/linaro-lsk/drivers/gpu/arm/midgard/mali_kbase.ko] undefined!
And the module fails to load:
[ 2174.100000] mali_kbase: Unknown symbol samsung_cpu_id (err 0)
I have been looking into the source and build files of this module but can not identify the issue. What can I do to properly compile this module?
The Mali kernel-side driver doesn't need to be built as a loadable module, and I believe the Linaro default configuration has it built-in. To build as a module, some symbols specific to the Exynos architecture need to be exported. If you don't need to change the Mali driver, then it's easier to leave it built-in. This should resolve your last error "Unknown symbol samsung_cpu_id". As long as you see the kernel messages from the driver and /dev/mali* is present then it means it's initialised.
Then the driver needs some platform data, which in this case is part of the driver source code itself. The "Platform data not specified" error should now be fixed with the "CONFIG_MALI_PLATFORM_THIRDPARTY_NAME=arndale_octa" option, which includes the "drivers/gpu/arm/midgard/platform/arndale_octa" directory. So I believe it should just work if you make the driver built-in again. I'm not sure why the platform name wasn't already in the default configuration; I'll try to investigate this.
Also, the kernel is reading some information from the monitor to set the appropriate video mode. Unless you set a hard-coded mode or there's a reliable fall-back mode, the DRM driver needs the monitor to be present.
Hi Guillaume,
I also assumed that the driver would be build-in by using the default configuration provided by the fragments. However, there is just no /dev/mali present and the kernel messages do not contain anything about Mali. I also noted that neither modules.order nor modules.builtin contains mali_kbase.ko. That is why I proceeded to build the driver as a module instead. I just retried to compile it build-in and made sure that
CONFIG_MALI_MIDGARD=y CONFIG_MALI_PLATFORM_THIRDPARTY=y CONFIG_MALI_PLATFORM_THIRDPARTY_NAME="arndale_octa"
CONFIG_MALI_MIDGARD=y
CONFIG_MALI_PLATFORM_THIRDPARTY=y
CONFIG_MALI_PLATFORM_THIRDPARTY_NAME="arndale_octa"
are all set. But still, the driver doesn't seem to be compiled into the kernel.
Could you by any chance send me your .config file so that I can see if some other configuration options are missing?
It seems like you've missed the mali-arndale-octa.conf configuration fragment. This is the command I run to generate the config file:
./scripts/kconfig/merge_config.sh linaro/configs/linaro-base.conf linaro/configs/arndale_octa.conf linaro/configs/mali-arndale-octa.conf
Could you please try again with this?
Hi bramv
We've been able to replicate the issue you've described and are working on it here. We'll get back to you as soon as we have something to share.
After some more investigation, we've found that you need to use even more configuration fragments:
./scripts/kconfig/merge_config.sh linaro/configs/linaro-base.conf linaro/configs/distribution.conf linaro/configs/arndale_octa.conf linaro/configs/lt-arndale_octa.conf linaro/configs/mali-arndale-octa.conf
Then if you use this build command:
make zreladdr-y=0x20008000 uImage modules dtbs -j8
it should create a arch/arm/boot/uImage file which you can replace on the 14.08 Linaro binary image, on the boot partition.
If you see build errors related to Gator, then a quick fix is to disable CONFIG_GATOR in .config (make menuconfig...). I hope this helps. Please note that this kernel contains the r4p0-02rel0 kernel-side Mali driver, so only the user-side driver with the exact same version should be used. This is not compatible with r4p1-00rel0 unless the kernel-side driver is upgraded.
I tried again, from a fresh Linaro 14.08 binary image, the Linaro kernel using the commit that you specified and used the configuration generated by the comment you provided. Using this kernel, I indeed get a /dev/mali0 device. According to my OpenCL benchmark however, performance is still very low (comparable to before).
You mention that the kernel includes the r4p1-00rel0 driver, while only the r4p0-02rel0 and r4p1-00rel0 user space driver are available. As you already mentioned, this is isn't likely to work. Furthermore, I noticed the following:
[ 5.965000] [drm] Initialized drm 1.1.0 20060810 [ 5.970000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully [ 5.980000] exynos-mixer 14450000.mixer: probe start [ 5.985000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 5.990000] [drm] No driver support for vblank timestamp query. [ 5.995000] exynos-sysmmu 14650000.sysmmu: Enabled [ 5.995000] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x45 b40000 [ 6.100000] Console: switching to colour frame buffer device 160x64 [ 6.120000] exynos-drm exynos-drm: fb0: frame buffer device [ 6.125000] exynos-drm exynos-drm: registered panic notifier [ 6.130000] [drm] Initialized exynos 1.0.0 20110530 on minor 0 [ 6.135000] v4 support [ 6.140000] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0 [ 6.145000] mali 11800000.mali: Probed as mali0
[ 5.965000] [drm] Initialized drm 1.1.0 20060810
[ 5.970000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully
[ 5.980000] exynos-mixer 14450000.mixer: probe start
[ 5.985000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 5.990000] [drm] No driver support for vblank timestamp query.
[ 5.995000] exynos-sysmmu 14650000.sysmmu: Enabled
[ 5.995000] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x45
b40000
[ 6.100000] Console: switching to colour frame buffer device 160x64
[ 6.120000] exynos-drm exynos-drm: fb0: frame buffer device
[ 6.125000] exynos-drm exynos-drm: registered panic notifier
[ 6.130000] [drm] Initialized exynos 1.0.0 20110530 on minor 0
[ 6.135000] v4 support
[ 6.140000] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0
[ 6.145000] mali 11800000.mali: Probed as mali0
Why does the kernel identify the GPU as r0p1? Since mali_kbase is built into the kernel I don't know of a way to find out if it is actually r4p1 that is being used.
I tried again, by downloading the latest (r4p1-00rel0) kernel driver and recompiled the kernel. Off-course, I now also used the corresponding r4p1-00rel0 userspace driver. The result however is exactly the same. Both this strange kernel message appears and OpenCL doesn't perform well.
bramv wrote: Why does the kernel identify the GPU as r0p1?
bramv wrote:
Why does the kernel identify the GPU as r0p1?
This is the GPU hardware revision, and is unrelated to the revision of the kernel or userspace driver.
The kernel comes with r4p0 integrated, not r4p1, as is identified in the tag of the release in the Linaro repository and so the r4p0 userspace binaries from malideveloper are the ones to use with the kernel as it comes.
If you want to use r4p1, that is possible but it make take some work to iron out the integration, it may not as simple as unpacking the kernel source into the right place in the kernel tree.
As for your low OpenCL Benchmark results, which benchmark are you using?
Hope this Helps,
You are right, I occidentally mixed up the two version numbers. So I should use the unmodified kernel in combination with the r4p0 userspace binary. This binary comes in two forms, X11 and fbdev. I tried both but don't see much of a difference. Which one should I use?
I am using the same clpeak benchmark that I used:
Platform: ARM Platform Device: Mali-T628 Driver version : 1.1 Compute units : 4 Clock frequency : 533 MHz Single-precision compute (GFLOPS) float : 1.56654 float2 : 3.92411 float4 : 3.92181 float8 : 4.84877 float16 : 4.82142 Device: Mali-T628 Driver version : 1.1 Compute units : 2 Clock frequency : 533 MHz Single-precision compute (GFLOPS) float : 0.747759 float2 : 1.96387 float4 : 1.98183 float8 : 2.43132 float16 : 2.41775
Platform: ARM Platform
Device: Mali-T628
Driver version : 1.1
Compute units : 4
Clock frequency : 533 MHz
Single-precision compute (GFLOPS)
float : 1.56654
float2 : 3.92411
float4 : 3.92181
float8 : 4.84877
float16 : 4.82142
Compute units : 2
float : 0.747759
float2 : 1.96387
float4 : 1.98183
float8 : 2.43132
float16 : 2.41775
It is nice to see that the two clusters are now properly recognized, but the results are nowhere near the +/- 33 Gflops that Chris reported:
I am seeing 33.27 and 33.17 SP GFLOPS for float2 and float4 respectively, or 45% of max theoretical peak
It's probably worth someone at our end testing this out as well, but as a quick sanity check can you ensure that the CPU and GPU DVFS are disabled/otherwise pinned (set CPU DVFS to performance and frequency to something high like 1.7GHz) before running the benchmark? These SoCs have a tendency to take the busfreq down with the CPUfreq when the CPU is idle, and as these GPU benchmarks tend not to stress the CPU too much, this has the effect that the bus clocks down to v/fmin and severely throttles the GPU's memory bandwidth.
Hi Chris,
I guess I would need a utility like cpufreq to set DVFS or change the clock frequency of the SoC, but don't know exactly how to do this. Instead I looked in the kernel configuration and disabled DVFS altogether and only enabled the performance governor. This however does not seem to have any impact on performance.
I agree that it would be useful i someone at our end could run the same benchmark. You can find my code here.
The X11 and fbdev choice gives you the choice to render GLES 3d content either directly into the framebuffer (fbdev), or within the X windowing system (X11). If you are doing nothing graphics related and have no need to install X11, then I would suggest you just use the fbdev version of the driver userspace; in terms of OpenCL they will be exactly the same.
Following on from what Chris said, it could well be that DVFS is clocking down bandwidth and therefore causing the reduced performance you are seeing. I can take a look at this at my end and try and verify if this is the case.
To test at your end quickly, you should be able to disable dvfs as follows:
echo off > /sys/class/misc/mali0/device/dvfs
Good to know the difference between X11 and fbdev. I have one follow-up question on this. So far I have linked my OpenCL programs to libmali.so from the userspace binary package instead of using the libOpenCL.so, since libOpenCL.so gives me lots of undefined reference errors whilst using libmali.so, compilation progresses without any errors. I also noticed that libmali.so is the only substantial file in the package:
# ll ../fbdev/ total 21056 drwxr-xr-x 2 root root 4096 Jan 1 2000 ./ drwx------ 11 root root 4096 Jan 1 2000 ../ -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libEGL.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv1_CM.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv2.so* -rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libOpenCL.so* -rwxr-x--- 1 16580 16580 21518354 Jul 23 14:44 libmali.so*
# ll ../fbdev/
total 21056
drwxr-xr-x 2 root root 4096 Jan 1 2000 ./
drwx------ 11 root root 4096 Jan 1 2000 ../
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libEGL.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv1_CM.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libGLESv2.so*
-rwxr-x--- 1 16580 16580 4806 Jul 23 14:44 libOpenCL.so*
-rwxr-x--- 1 16580 16580 21518354 Jul 23 14:44 libmali.so*
Is using libmali.so to lik against the correct way to go?
Regarding DVFS, the /device/dvfs file is not present on my system:
# tree /sys/class/misc/mali0 /sys/class/misc/mali0 ├── dev ├── device -> ../../../11800000.mali ├── power │ ├── autosuspend_delay_ms │ ├── control │ ├── runtime_active_time │ ├── runtime_status │ └── runtime_suspended_time ├── subsystem -> ../../../../class/misc └── uevent 3 directories, 7 files
# tree /sys/class/misc/mali0
/sys/class/misc/mali0
├── dev
├── device -> ../../../11800000.mali
├── power
│ ├── autosuspend_delay_ms
│ ├── control
│ ├── runtime_active_time
│ ├── runtime_status
│ └── runtime_suspended_time
├── subsystem -> ../../../../class/misc
└── uevent
3 directories, 7 files
And your command results in a permission denied error, even though I am using a root shell:
# echo off > /sys/class/misc/mali0/device/dvfs -bash: /sys/class/misc/mali0/device/dvfs: Permission denied
# echo off > /sys/class/misc/mali0/device/dvfs
-bash: /sys/class/misc/mali0/device/dvfs: Permission denied
I am looking forward to seeing your clpeak results!
"libOpenCL.so" is the spec defined library name that OpenCL should be exposed as on the platform. In our case we implement this (and the GLES libs) as shims which pass through to the libmali.so binary, which is why that's the largest one there. For development purposes you can link against either, but obviously you wouldn't want to do this for a release, you'd want to link against libOpenCL.so for portability.
That said, it SHOULD work, so if you are having issues linking against libOpenCL.so feel free to share them here and we will take a look. I can't think of a good reason why one should work and the other fail.
All the errors are undefined reference errors at link time, the symbols should be there so offhand not sure why that's happening, but in any case linking against libmali.so is working
In your above output, ldd a.out is reporting /usr/lib/libmali.so on the first line, so thats working as expected.
I put all files from the binary userspace driver in the /usr/lib/ directory and compiled just using: "g++ program.cpp -lOpenCL". This gives the same result as compiling like this: "g++ program.cpp /usr/lib/libOpenCL.so". The output is too long to show here.
Compiling using "g++ clpeak-arndale-octa.cpp /usr/lib/libmali.so" works just fine: But now that I am further looking into the compilation, I noticed the following:
# ldd a.out /usr/lib/libmali.so (0xb5e03000) libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb5d4a000) libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb5cde000) libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb5cbd000) libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb5bd6000) /lib/ld-linux-armhf.so.3 (0xb6f6d000) librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb5bc8000) libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb5bac000) libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb5ba1000)
# ldd a.out
/usr/lib/libmali.so (0xb5e03000)
libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb5d4a000)
libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb5cde000)
libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb5cbd000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb5bd6000)
/lib/ld-linux-armhf.so.3 (0xb6f6d000)
librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb5bc8000)
libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb5bac000)
libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb5ba1000)
The resulting binary does not contain any reference to a OpenCL or Mali library. Does this mean that the program runs on the CPU instead of GPU? But then, why does it report the two Mali-T628 devices? I am confused.
Whoops, I missed that line. In that case I can conclude the following:
I am running a kernel containing the proper Mali kernel driver, my program is linked to the corresponding userspace binary and T628-MP6 is correctly recognized.
Is it indeed DVFS that is preventing the benchmark to achieve the expected performance level? Why is /sys/class/misc/mali0/device/dvfs missing?
Taken shamelessly from an answer by peterharris in another thread:
"The DVFS code for the GPU is not directly managed by our drivers - it is part of the platform integration provided in the BSP from Insignal. This style of integration occurs because the DVFS analogue parts which control F and V for the power domains are not part of the ARM IP. This question is probably best asked to Samsung or Insignal, as they maintain the BSP for that platform."
It is possible to disable features such as DVFS by recompiling the linux kernel and mali kernel module with the correct configuration. The reason this reduced performance (normally) happens is because DVFS ties the GPU frequencies to the workload of the CPU. As you are running your intensive GPU test, the CPU is left to idle and so DVFS drops the CPU core speed, unfortunately, also dropping the GPU frequency.
A means of stopping this happening would be to add some CPU intensive code to run whilst the GPU code is running to stop DVFS dropping the frequencies.
With regards to the Linker errors, you should be able to fix the issue by linking against both mali and OpenCL. OpenCL will provide the runtime linker target (even in the absence of mali) whilst mali will provide the symbols at compile time, stopping the errors in your shared paste.
View all questions in Graphics and Gaming forum