This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Chrombook OpenCL Development issues

Hi all,

I have gotten a Samsung Chromebook model number Xe303c12 and I am trying to get some OpenCL code running.

I have followed the "Graphics and Compute Development on Samsung Chromebook" (including disabling CONFIG_SECURITY_CHROMIUMOS) and have been able to boot to linux on the Chromebook.  My issue is that I see no network at all, either via the apple USB dongle or the wireless.  The devices don't even seem to exist so far as linux is concerned.  Trying to install the USB device via modprobe gives an "Operation not permitted" error on usbnet.ko.

Has anyone gotten networking to work?  If so how? 

If anyone is willing to share a working sdcard image that would be fantastic.  To be honest I got sick of building the linux kernel sometime in the 90s.

Thanks.

--Mike

P.S.  I don't see OpenCL headers.  Can I just copy them from a working system?

  • Hi Mike,

    I'm not sure on the answer for this one, but if you mark it as a "question" rather than a "discussion" it will probably be a bit more visible (unanswered questions stay on the front-page of the ARM Mali Graphics place.

    Cheers,
    Pete

  • > P.S.  I don't see OpenCL headers.  Can I just copy them from a working system?


    The Mali OpenCL SDK contains them, or you can grab them from Khronos.org (get the 1.1 headers). Have you tried building in support for the apple usb dongle?


    Thanks,

    Chris

  • Hi Mike,

    Wifi and Ethernet USB adapters work without problems on the Chromebook. Can you please post the last few lines of 'dmesg' after running insmod/modprobe?. I suspect the kernel modules you attempted to load weren't compatible with the kernel you're running (e.g., a different build).

    If you installed the kernel modules properly by the guide, the kernel will load the right kernel modules automatically upon plugging in the device.

    Disabling SECURITY_CHROMIUMOS is sufficient to get it working, but you can also use "lsm.module_locking=0" kernel parameter to allow loading modules whilst retaining other security features of the Chromebook kernel.

    Thanks,

    Tu

  • Tu,

    When I plug in the Apple USB dongle, dmesg has the following error:

    "Chromium OS LSM: init_module old-api-denied module=<unknown> ..."

    Using "modprobe -v asix" (that is the right driver, correct?) I get the following output on the terminal:

    insmod /lib/modules/3.8.11/kernel/drivers/net/usb/usbnet.ko

    WARNING: Error inserting usbnet (/lib/modules/3.8.11/kernel/drivers/net/usb/usbnet.ko): Operation not permitted

    FATAL: Error inserting asix (/lib/modules/3.8.11/kernel/drivers/net/usb/asix.ko): Operation not permitted

    dmesg has a similar "init_module old-api-denied" error.

    Looking through dmesg it looks like none of the modules are actually loading.

    Regarding "SECURITY_CHROMIUMOS":  It is unclear in the guide as to how to disable this.  I assumed that commenting out "CONFIG_SECURITY_CHROMIUMOS=y" in chromeos/config/base.config would do this.  As to building the apple drivers, I ran the "make ARCH=arm CROSS_COMPILE=<toolchain prefix> xconfig" step and it seemed to be enabled by default, so I just saved the config and moved on.

    How do I set the kernel parameter?

    I'm building this on Ubuntu 11 64 bit in a VM if that makes any difference.

    Thank you for your assistance.

  • Looks like adding the lsm.module_locking=0 to boot_params made the usb networking work, still no wlan, but I'm OK with that.  Thanks for the help.

  • mikewinter wrote:

    I assumed that commenting out "CONFIG_SECURITY_CHROMIUMOS=y" in chromeos/config/base.config would do this.

    Can you check if CONFIG_SECURITY_CHROMIUMOS is present in the .config in the base kernel directory? This is the config file actually used when building the kernel.

  • Hi Mike,

    Modifying chromeos/config/base.config wouldn't have an effect on the current kernel config as it is just a template. You'll need to modify the .config file or perferrably use menuconfig/xconfig.


    Wifi not working in Ubuntu 12.04 is a known issue. It is because the version of udev in Ubuntu repo is outdated and doesn't recognise the Wifi card (the kernel does). To fix this you can use a more modern version of Ubuntu or a different distro.


    Thanks,

    Tu

  • Tu,

    I found the setting in .config and disabled it.  I haven't tried booting without the lsm.module_locking=0 kernel parameter, so I'm not sure if it helps or not.

    I have gotten my code running on the Chromebook at this point.  I'm curious as to perfromance.

    Running the code on both my iMac (quad i7, GeForce 680MX) and the chromebook, there are some interesting discrepancies in performance.  On paper the 680MX should be about 30 times as fast.

    My code does some image processing on mulitband images (in this case 50 band images).  The three benchmarks are for the mean 50-band image pixel, the image covariance matrix, and something called the RX anomaly statistic.  The RX calculates statistics locally to produce an anomaly measurement.  Thus it calculates the mean in a 10x10 window, the covariance in a 10x10 window.  The covariance has to be inverted and then multiplied against a pixel of interest to produce a target score.  All of these are done on the GPU in a single kernel each.  For comparison I run the exact same algorithms on the CPUs.

    For speed sake, I'm running on a small image.

    iMac GeForce 680Mx  2234 GFLOPS  100 Watts+

    Quad i7 3.4 Ghz

    ======================================

    GPU GeForce 680Mx

    mean  0.0042

    cov  0.0423

    RX  0.3392

    CPU

    mean  0.0111

    cov  0.0218

    RX  0.3410

    Chromebook Exynos  68 GFlops  1/32 the speed

    =====================================

    GPU Mali 604

    mean  0.0373

    cov  0.9044

    RX  20.59

    CPU

    mean  0.3911

    cov  0.2464

    RX  20.0725

    Obviously the i7 GeForce is going to cream the chromebook.  What is interesting here is the relative slowness of the covariance calculation.   The Mali is about 3 times slower then the ARM CPU at calculating this, which surprises me, since it essentially a matrix multiply.  For performance, do  I need to use the OpenCL vector load and math operations (I notice you vectorized sobel is 10 times the speed of the unvectorized, and your sgemm uses vectors).  I have not done this in the past since it makes no difference for the nVidia GPUs I've been using, and to be honest it is a pain.

    I'm also having the same stability problems as everyone else related to the MMC.  I get constant crashes.  I've been working off of a NFS disk to avoid writing to the SD card.  Is there any other option to this?

    Thank you for your assistance. I'm really interested in low-power computing and I am excited about your products.

    --Mike

  • Reading the development guide, it looks like the Mali doesn't have separate high-speed local memory.  Everything is just stored in global memory, so this may explain things.  The COV and RX functions use a ton of local memory.  I'm copying chunks of the images into local memory for performance reasons.  Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance.  I'll take a look at my code some more.

    --Mike

  • Hi Mike,


    > Everything is just stored in global memory, so this may explain things


    GPUs are designed to be latency tolerant, so where things are stored is a little less critical than CPUs, so although lower latency memory will always help it's isn't usually necessary.


    > For performance, do I need to use the OpenCL vector load and math operations


    Ideally yes - the Mali-T600 is a vector architecture with SIMD maths units, which is different to many other GPU architectures.


    Where Mali excels is that our SIMD units are very flexible and very wide - if you only need int8 or int16 data for your kernel we can process 16 or 8 elements per SIMD unit per clock cycle (i.e. we have a 128-bit data path and you can carve that up into 8, 16, or 32-bit lanes). If you need floating point I believe the current drivers we have available off the website are only exposing fp32, but we are adding the half-float extension support in our next driver release.


    While the compiler can auto-vectorize (and it is getting better at doing so, so we hope to improve here) there needs to be enough work in a work-item to fill the SIMD lanes, and auto-vectorization is relatively fiddly in any compiler, so it is always more reliable if you use the built-in functions.


    You may want to try downloading the ARM DS-5 Community Edition - this supports the Streamline profiling tool which includes support for capturing and displaying the GPU hardware performance counters. This should help you indicate where your GPU cycles are being spent, including some measure of the efficiency of the GPU interaction with main memory. We have an optimization guide which includes some hints on what counters you want to look at for different types of problem, but if you have any questions please shout:


    Mali GPU Application Optimization Guide v3.0 « Mali Developer Center


    Kind regards,
    Pete

  • > I've been working off of a NFS disk to avoid writing to the SD card.  Is there any other option to this?

    This is the method I use, although USB stick is also viable.

    > Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance

    This is unnecessary on our architecture, and everything can be done from global memory. The guide Pete links to is focused on graphics I believe but will give some good insight and performance considerations on our architecture. There is also an OpenCL specific guide at Mali-T600 Series GPU OpenCL Developer Guide « Mali Developer Center which is worth a read!

    Thanks,

    Chris

  • Unmarking this as a question as I think there's no one correct answer at this point. Please continue having a discussion here if you're finding it useful though