This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Chrombook OpenCL Development issues

Hi all,

I have gotten a Samsung Chromebook model number Xe303c12 and I am trying to get some OpenCL code running.

I have followed the "Graphics and Compute Development on Samsung Chromebook" (including disabling CONFIG_SECURITY_CHROMIUMOS) and have been able to boot to linux on the Chromebook.  My issue is that I see no network at all, either via the apple USB dongle or the wireless.  The devices don't even seem to exist so far as linux is concerned.  Trying to install the USB device via modprobe gives an "Operation not permitted" error on usbnet.ko.

Has anyone gotten networking to work?  If so how? 

If anyone is willing to share a working sdcard image that would be fantastic.  To be honest I got sick of building the linux kernel sometime in the 90s.

Thanks.

--Mike

P.S.  I don't see OpenCL headers.  Can I just copy them from a working system?

Parents
  • Reading the development guide, it looks like the Mali doesn't have separate high-speed local memory.  Everything is just stored in global memory, so this may explain things.  The COV and RX functions use a ton of local memory.  I'm copying chunks of the images into local memory for performance reasons.  Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance.  I'll take a look at my code some more.

    --Mike

Reply
  • Reading the development guide, it looks like the Mali doesn't have separate high-speed local memory.  Everything is just stored in global memory, so this may explain things.  The COV and RX functions use a ton of local memory.  I'm copying chunks of the images into local memory for performance reasons.  Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance.  I'll take a look at my code some more.

    --Mike

Children
  • Hi Mike,


    > Everything is just stored in global memory, so this may explain things


    GPUs are designed to be latency tolerant, so where things are stored is a little less critical than CPUs, so although lower latency memory will always help it's isn't usually necessary.


    > For performance, do I need to use the OpenCL vector load and math operations


    Ideally yes - the Mali-T600 is a vector architecture with SIMD maths units, which is different to many other GPU architectures.


    Where Mali excels is that our SIMD units are very flexible and very wide - if you only need int8 or int16 data for your kernel we can process 16 or 8 elements per SIMD unit per clock cycle (i.e. we have a 128-bit data path and you can carve that up into 8, 16, or 32-bit lanes). If you need floating point I believe the current drivers we have available off the website are only exposing fp32, but we are adding the half-float extension support in our next driver release.


    While the compiler can auto-vectorize (and it is getting better at doing so, so we hope to improve here) there needs to be enough work in a work-item to fill the SIMD lanes, and auto-vectorization is relatively fiddly in any compiler, so it is always more reliable if you use the built-in functions.


    You may want to try downloading the ARM DS-5 Community Edition - this supports the Streamline profiling tool which includes support for capturing and displaying the GPU hardware performance counters. This should help you indicate where your GPU cycles are being spent, including some measure of the efficiency of the GPU interaction with main memory. We have an optimization guide which includes some hints on what counters you want to look at for different types of problem, but if you have any questions please shout:


    Mali GPU Application Optimization Guide v3.0 « Mali Developer Center


    Kind regards,
    Pete

  • > I've been working off of a NFS disk to avoid writing to the SD card.  Is there any other option to this?

    This is the method I use, although USB stick is also viable.

    > Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance

    This is unnecessary on our architecture, and everything can be done from global memory. The guide Pete links to is focused on graphics I believe but will give some good insight and performance considerations on our architecture. There is also an OpenCL specific guide at Mali-T600 Series GPU OpenCL Developer Guide « Mali Developer Center which is worth a read!

    Thanks,

    Chris