This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL support for Mali-T628 MP6 on Arndale Octa?

Summary

Is OpenCL support for the Mali-T628 (for example as found in the Exynos 5420 SoC on the Arndale Octa board) available? If so, how to set it up?

More details

According to the vendor, OpenCL should be supported, but the Arndale Octa Wiki does not state how this can be achieved.

I am using the latest Linaro developer build and installed Mali drivers that contain OpenCL libraries for Mali T604. According to this guide, the driver actually contains references to the Mali T628. So I tried to create the udev rule as specified, which is supposed to solve a permission problem with /dev/mali0, but I found that there is no /dev/mali0 on my installation at all. So my conclusion is that the driver indeed does not support T628.

When I execute a clinfo utility, clGetDeviceInfo returns CL_OUT_OF_HOST_MEMORY for some device properties. Why can I query the GPU for some characteristics, but does this fail for some others? When running a normal application, the same error appears when trying to create an OpenCL Context.

I was surprised to find this topic, where yoshi seems to have OpenCL working and can run benchmarks on his Arndale Octa board. How is this possible if there is no driver available? Or am I just missing something? I hope that you can help me to also establish a working OpenCL development environment.

  • Sorry, I mentioned the Arndale board in my previous comment but you're using an Arndale Octa which has a Mali-T628 GPU.  You'll need to use a different version of the binary Linaro Releases, which also contain a different user-side Mali driver binary.  The kernel source and configuration I mentioned are correct for Arndale Octa.


    Guillaume

  • Hello Guillaume,

    These are very useful replies indeed!

    I downloaded the linaro-lsk kernel, checked out the lsk-v3.14-lt-mali-r4p0-beta2 branch and configured it using the linaro-base.conf, distribution.conf and arndale_octa.conf fragments that I got from here, using the scripts/kconfig/merge_config.sh script.

    I noticed that there are no referenced to gator or mali_kbase in neither modules.buitlin or modules.order, is that correct?

    While booting the board, it hangs on initializing DRM:

    [    5.935000] console [ttySAC3] enabled

    [    5.935000] [drm] Initialized drm 1.1.0 20060810

    [    5.940000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully

    [    5.950000] exynos-mixer 14450000.mixer: probe start

    [    5.955000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).

    [    5.960000] [drm] No driver support for vblank timestamp query.

    [    5.965000] [drm:exynos_drm_connector_get_modes] *ERROR* Panel operation get_edid failed -19

    This is without connecting any display. With display, the error is different:

    [    5.940000] console [ttySAC3] enabled

    [    5.945000] [drm] Initialized drm 1.1.0 20060810

    [    5.950000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully

    [    5.955000] exynos-mixer 14450000.mixer: probe start

    [    5.965000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).

    [    5.970000] [drm] No driver support for vblank timestamp query.

    I have seen that before while trying this kernel.

    After retrying for a few times the kernel did finally boot, but here is no sign of mali_kbase. While running the new kernel, I configured mali_kbase to be compiled as module and loaded it. It does not seem to work:

    [  424.745000] mali 11800000.mali: Platform data not specified

    [  424.750000] mali: probe of 11800000.mali failed with error -2

    So I set MALI_EXPERT=y and CONFIG_MALI_PLATFORM_THIRDPARTY_NAME=arndale_octa, but this gives a warning during compilation:

    WARNING: "samsung_cpu_id" [/root/linaro-lsk/drivers/gpu/arm/midgard/mali_kbase.ko] undefined!

    And the module fails to load:

    [ 2174.100000] mali_kbase: Unknown symbol samsung_cpu_id (err 0)

    I have been looking into the source and build files of this module but can not identify the issue. What can I do to properly compile this module?

  • Hi bramv,

    The Mali kernel-side driver doesn't need to be built as a loadable module, and I believe the Linaro default configuration has it built-in.  To build as a module, some symbols specific to the Exynos architecture need to be exported.  If you don't need to change the Mali driver, then it's easier to leave it built-in.  This should resolve your last error "Unknown symbol samsung_cpu_id".  As long as you see the kernel messages from the driver and /dev/mali* is present then it means it's initialised.

    Then the driver needs some platform data, which in this case is part of the driver source code itself.  The "Platform data not specified" error should now be fixed with the "CONFIG_MALI_PLATFORM_THIRDPARTY_NAME=arndale_octa" option, which includes the "drivers/gpu/arm/midgard/platform/arndale_octa" directory.  So I believe it should just work if you make the driver built-in again.  I'm not sure why the platform name wasn't already in the default configuration; I'll try to investigate this.

    Also, the kernel is reading some information from the monitor to set the appropriate video mode.  Unless you set a hard-coded mode or there's a reliable fall-back mode, the DRM driver needs the monitor to be present.

    Guillaume

  • Hi Guillaume,


    I also assumed that the driver would be build-in by using the default configuration provided by the fragments. However, there is just no /dev/mali present and the kernel messages do not contain anything about Mali. I also noted that neither modules.order nor modules.builtin contains mali_kbase.ko. That is why I proceeded to build the driver as a module instead. I just retried to compile it build-in and made sure that

    CONFIG_MALI_MIDGARD=y

    CONFIG_MALI_PLATFORM_THIRDPARTY=y

    CONFIG_MALI_PLATFORM_THIRDPARTY_NAME="arndale_octa"

    are all set. But still, the driver doesn't seem to be compiled into the kernel.


    Could you by any chance send me your .config file so that I can see if some other configuration options are missing?

  • Hi bramv,

    It seems like you've missed the mali-arndale-octa.conf configuration fragment.  This is the command I run to generate the config file:

    ./scripts/kconfig/merge_config.sh linaro/configs/linaro-base.conf linaro/configs/arndale_octa.conf linaro/configs/mali-arndale-octa.conf

    Could you please try again with this?

    Best wishes,

    Guillaume

  • Hi bramv

    We've been able to replicate the issue you've described and are working on it here.  We'll get back to you as soon as we have something to share.

    Thanks,

    Rich

  • Yes it seems so. I tried again using the command that you specified.

    Building the kernel image works just fine, but errors occur when building the modules:

    make-j8 modules

      CHK     include/config/kernel.release

      CHK     include/generated/uapi/linux/version.h

      CHK     include/generated/utsrelease.h

    make[1]: `include/generated/mach-types.h' is up to date.

      CALL    scripts/checksyscalls.sh

      Building modules, stage 2.

      MODPOST 9 modules

    ERROR: "tcp_nuke_addr" [net/ipv6/ipv6.ko] undefined!

    I disabled ipv6 altogether using "make xconfig" and compiled everything again. This time all is well, until I reboot. Sometimes it gives me a kernel panic:

    [    6.195000] [<c005211f>] (cpu_startup_entry) from [<20008525>] (0x20008525)

    [    6.195000] CPU2: stopping

    [    6.195000] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.14.7-0-linaro-arndale-octa #26

    [    6.195000] [<c00127a5>] (unwind_backtrace) from [<c000fba9>] (show_stack+0x11/0x14)

    [    6.195000] [<c000fba9>] (show_stack) from [<c0435655>] (dump_stack+0x61/0x6c)

    [    6.195000] [<c0435655>] (dump_stack) from [<c0011903>] (handle_IPI+0x16b/0x184)

    [    6.195000] [<c0011903>] (handle_IPI) from [<c000848f>] (gic_handle_irq+0x57/0x58)

    [    6.195000] [<c000848f>] (gic_handle_irq) from [<c001045b>] (__irq_svc+0x3b/0x5c)

    [    6.195000] Exception stack(0xe611bf50 to 0xe611bf98)

    [    6.195000] bf40:                                     e611bf98 00000006 71431ea3 00000001

    [    6.195000] bf60: 61755fdc 00000001 e675c368 c06f670c 00000000 00000002 c06f66c0 c06f04e0

    [    6.195000] bf80: 00000000 e611bf98 29aaaaab c035e052 60000133 ffffffff

    [    6.195000] [<c001045b>] (__irq_svc) from [<c035e052>] (cpuidle_enter_state+0x3a/0xa8)

    [    6.195000] [<c035e052>] (cpuidle_enter_state) from [<c035e145>] (cpuidle_idle_call+0x85/0x13c)

    [    6.195000] [<c035e145>] (cpuidle_idle_call) from [<c000d7b1>] (arch_cpu_idle+0xd/0x2c)

    [    6.195000] [<c000d7b1>] (arch_cpu_idle) from [<c005211f>] (cpu_startup_entry+0x10f/0x148)

    [    6.195000] [<c005211f>] (cpu_startup_entry) from [<20008525>] (0x20008525)

    [    6.195000] drm_kms_helper: panic occurred, switching back to text console

    And sometimes it fails on mounting the root file system:

    Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.

    [    8.360000] usb 5-1.4: new high-speed USB device number 3 using exynos-ehci

    [    8.460000] usb 5-1.4: New USB device found, idVendor=0b95, idProduct=772a

    [    8.470000] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3

    [    8.480000] usb 5-1.4: Product: AX88772

    [    8.485000] usb 5-1.4: Manufacturer: ASIX Elec. Corp.

    [    8.490000] usb 5-1.4: SerialNumber: 000001

    [    8.810000] asix 5-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, 02:0d:ce:dd:fc:24

    chvt: can't open console

    Gave up waiting for root device.  Common problems:

    - Boot args (cat /proc/cmdline)

       - Check rootdelay= (did the system wait long enough?)

       - Check root= (did the system wait for the right device?)

    - Missing modules (cat /proc/modules; ls /dev)

    chvt: can't open console

    ALERT!  /dev/disk/by-uuid/126fc49e-6766-4491-93c4-4580bb0fab91 does not exist.  Dropping to a shell!

    Couldn't get a file descriptor referring to the console

    I found that the second error might be caused by the CONFIG_DEVTMPFS option being not set. I tried again using CONFIG_DEVTMPFS=y but the errors persist.

  • Hi bramv,

    After some more investigation, we've found that you need to use even more configuration fragments:

    ./scripts/kconfig/merge_config.sh linaro/configs/linaro-base.conf linaro/configs/distribution.conf linaro/configs/arndale_octa.conf linaro/configs/lt-arndale_octa.conf linaro/configs/mali-arndale-octa.conf

    Then if you use this build command:

    make zreladdr-y=0x20008000 uImage modules dtbs -j8

    it should create a arch/arm/boot/uImage file which you can replace on the 14.08 Linaro binary image, on the boot partition.

    If you see build errors related to Gator, then a quick fix is to disable CONFIG_GATOR in .config (make menuconfig...).  I hope this helps.  Please note that this kernel contains the r4p0-02rel0 kernel-side Mali driver, so only the user-side driver with the exact same version should be used.  This is not compatible with r4p1-00rel0 unless the kernel-side driver is upgraded.

    Best wishes,

    Guillaume

  • I tried again, from a fresh Linaro 14.08 binary image, the Linaro kernel using the commit that you specified and used the configuration generated by the comment you provided. Using this kernel, I indeed get a /dev/mali0 device. According to my OpenCL benchmark however, performance is still very low (comparable to before).

    You mention that the kernel includes the r4p1-00rel0 driver, while only the r4p0-02rel0 and r4p1-00rel0 user space driver are available.  As you already mentioned, this is isn't likely to work. Furthermore, I noticed the following:

    [    5.965000] [drm] Initialized drm 1.1.0 20060810

    [    5.970000] i2c i2c-2: attached exynos4210-hdmiddc into i2c adapter successfully

    [    5.980000] exynos-mixer 14450000.mixer: probe start

    [    5.985000] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).

    [    5.990000] [drm] No driver support for vblank timestamp query.

    [    5.995000] exynos-sysmmu 14650000.sysmmu: Enabled

    [    5.995000] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x45

    b40000

    [    6.100000] Console: switching to colour frame buffer device 160x64

    [    6.120000] exynos-drm exynos-drm: fb0:  frame buffer device

    [    6.125000] exynos-drm exynos-drm: registered panic notifier

    [    6.130000] [drm] Initialized exynos 1.0.0 20110530 on minor 0

    [    6.135000] v4 support

    [    6.140000] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0

    [    6.145000] mali 11800000.mali: Probed as mali0

    Why does the kernel identify the GPU as r0p1? Since mali_kbase is built into the kernel I don't know of a way to find out if it is actually r4p1 that is being used.


    I tried again, by downloading the latest (r4p1-00rel0) kernel driver and recompiled the kernel. Off-course, I now also used the corresponding r4p1-00rel0 userspace driver. The result however is exactly the same. Both this strange kernel message appears and OpenCL doesn't perform well.

  • Hi bramv,

    bramv wrote:

    Why does the kernel identify the GPU as r0p1?

    This is the GPU hardware revision, and is unrelated to the revision of the kernel or userspace driver.

    Hth,

    Chris

  • Hi bramv,

    The kernel comes with r4p0 integrated, not r4p1, as is identified in the tag of the release in the Linaro repository and so the r4p0 userspace binaries from malideveloper are the ones to use with the kernel as it comes.

    If you want to use r4p1, that is possible but it make take some work to iron out the integration, it may not as simple as unpacking the kernel source into the right place in the kernel tree.

    As for your low OpenCL Benchmark results, which benchmark are you using?

    Hope this Helps,

    Rich

  • You are right, I occidentally mixed up the two version numbers. So I should use the unmodified kernel in combination with the  r4p0 userspace binary. This binary comes in two forms, X11 and fbdev. I tried both but don't see much of a difference. Which one should I use?

    I am using the same clpeak benchmark that I used:

    Platform: ARM Platform

    Device: Mali-T628

    Driver version  : 1.1

    Compute units   : 4

    Clock frequency : 533 MHz

    Single-precision compute (GFLOPS)

    float   : 1.56654

    float2  : 3.92411

    float4  : 3.92181

    float8  : 4.84877

    float16 : 4.82142

    Device: Mali-T628

    Driver version  : 1.1

    Compute units   : 2

    Clock frequency : 533 MHz

    Single-precision compute (GFLOPS)

    float   : 0.747759

    float2  : 1.96387

    float4  : 1.98183

    float8  : 2.43132

    float16 : 2.41775

    It is nice to see that the two clusters are now properly recognized, but the results are nowhere near the +/- 33 Gflops that Chris reported:

    I am seeing 33.27 and 33.17 SP GFLOPS for float2 and float4 respectively, or 45% of max theoretical peak
  • Hi bramv,

    It's probably worth someone at our end testing this out as well, but as a quick sanity check can you ensure that the CPU and GPU DVFS are disabled/otherwise pinned (set CPU DVFS to performance and frequency to something high like 1.7GHz) before running the benchmark? These SoCs have a tendency to take the busfreq down with the CPUfreq when the CPU is idle, and as these GPU benchmarks tend not to stress the CPU too much, this has the effect that the bus clocks down to v/fmin and severely throttles the GPU's memory bandwidth.

    Hth,

    Chris

  • Hi Chris,

    I guess I would need a utility like cpufreq to set DVFS or change the clock frequency of the SoC, but don't know exactly how to do this. Instead I looked in the kernel configuration and disabled DVFS altogether and only enabled the performance governor. This however does not seem to have any impact on performance.


    I agree that it would be useful i someone at our end could run the same benchmark. You can find my code here.

  • Hi bramv,

    The X11 and fbdev choice gives you the choice to render GLES 3d content either directly into the framebuffer (fbdev), or within the X windowing system (X11).  If you are doing nothing graphics related and have no need to install X11, then I would suggest you just use the fbdev version of the driver userspace; in terms of OpenCL they will be exactly the same.

    Following on from what Chris said, it could well be that DVFS is clocking down bandwidth and therefore causing the reduced performance you are seeing.  I can take a look at this at my end and try and verify if this is the case.

    To test at your end quickly, you should be able to disable dvfs as follows:

    echo off > /sys/class/misc/mali0/device/dvfs

    Hope this Helps,

    Rich