This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Chrombook OpenCL Development issues

Hi all,

I have gotten a Samsung Chromebook model number Xe303c12 and I am trying to get some OpenCL code running.

I have followed the "Graphics and Compute Development on Samsung Chromebook" (including disabling CONFIG_SECURITY_CHROMIUMOS) and have been able to boot to linux on the Chromebook. My issue is that I see no network at all, either via the apple USB dongle or the wireless. The devices don't even seem to exist so far as linux is concerned. Trying to install the USB device via modprobe gives an "Operation not permitted" error on usbnet.ko.

Has anyone gotten networking to work? If so how?

If anyone is willing to share a working sdcard image that would be fantastic. To be honest I got sick of building the linux kernel sometime in the 90s.

Thanks.

--Mike

P.S. I don't see OpenCL headers. Can I just copy them from a working system?

Parents

Chris Varnsverry over 11 years ago in reply to mike winter

mikewinter wrote:

I assumed that commenting out "CONFIG_SECURITY_CHROMIUMOS=y" in chromeos/config/base.config would do this.

Can you check if CONFIG_SECURITY_CHROMIUMOS is present in the .config in the base kernel directory? This is the config file actually used when building the kernel.
Cancel
Vote up 0 Vote down

Cancel

Reply

Chris Varnsverry over 11 years ago in reply to mike winter

mikewinter wrote:

I assumed that commenting out "CONFIG_SECURITY_CHROMIUMOS=y" in chromeos/config/base.config would do this.

Can you check if CONFIG_SECURITY_CHROMIUMOS is present in the .config in the base kernel directory? This is the config file actually used when building the kernel.
Cancel
Vote up 0 Vote down

Cancel

Children

mike winter over 11 years ago in reply to Chris Varnsverry

Tu,
I found the setting in .config and disabled it. I haven't tried booting without the lsm.module_locking=0 kernel parameter, so I'm not sure if it helps or not.
I have gotten my code running on the Chromebook at this point. I'm curious as to perfromance.
Running the code on both my iMac (quad i7, GeForce 680MX) and the chromebook, there are some interesting discrepancies in performance. On paper the 680MX should be about 30 times as fast.
My code does some image processing on mulitband images (in this case 50 band images). The three benchmarks are for the mean 50-band image pixel, the image covariance matrix, and something called the RX anomaly statistic. The RX calculates statistics locally to produce an anomaly measurement. Thus it calculates the mean in a 10x10 window, the covariance in a 10x10 window. The covariance has to be inverted and then multiplied against a pixel of interest to produce a target score. All of these are done on the GPU in a single kernel each. For comparison I run the exact same algorithms on the CPUs.
For speed sake, I'm running on a small image.
iMac GeForce 680Mx 2234 GFLOPS 100 Watts+
Quad i7 3.4 Ghz
======================================
GPU GeForce 680Mx
mean 0.0042
cov 0.0423
RX 0.3392
CPU
mean 0.0111
cov 0.0218
RX 0.3410
Chromebook Exynos 68 GFlops 1/32 the speed
=====================================
GPU Mali 604
mean 0.0373
cov 0.9044
RX 20.59
CPU
mean 0.3911
cov 0.2464
RX 20.0725
Obviously the i7 GeForce is going to cream the chromebook. What is interesting here is the relative slowness of the covariance calculation. The Mali is about 3 times slower then the ARM CPU at calculating this, which surprises me, since it essentially a matrix multiply. For performance, do I need to use the OpenCL vector load and math operations (I notice you vectorized sobel is 10 times the speed of the unvectorized, and your sgemm uses vectors). I have not done this in the past since it makes no difference for the nVidia GPUs I've been using, and to be honest it is a pain.
I'm also having the same stability problems as everyone else related to the MMC. I get constant crashes. I've been working off of a NFS disk to avoid writing to the SD card. Is there any other option to this?
Thank you for your assistance. I'm really interested in low-power computing and I am excited about your products.
--Mike
Cancel
Vote up 0 Vote down

Cancel
mike winter over 11 years ago in reply to mike winter

Reading the development guide, it looks like the Mali doesn't have separate high-speed local memory. Everything is just stored in global memory, so this may explain things. The COV and RX functions use a ton of local memory. I'm copying chunks of the images into local memory for performance reasons. Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance. I'll take a look at my code some more.
--Mike
Cancel
Vote up 0 Vote down

Cancel
Peter Harris over 11 years ago in reply to mike winter

Hi Mike,

> Everything is just stored in global memory, so this may explain things

GPUs are designed to be latency tolerant, so where things are stored is a little less critical than CPUs, so although lower latency memory will always help it's isn't usually necessary.

> For performance, do I need to use the OpenCL vector load and math operations

Ideally yes - the Mali-T600 is a vector architecture with SIMD maths units, which is different to many other GPU architectures.

Where Mali excels is that our SIMD units are very flexible and very wide - if you only need int8 or int16 data for your kernel we can process 16 or 8 elements per SIMD unit per clock cycle (i.e. we have a 128-bit data path and you can carve that up into 8, 16, or 32-bit lanes). If you need floating point I believe the current drivers we have available off the website are only exposing fp32, but we are adding the half-float extension support in our next driver release.

While the compiler can auto-vectorize (and it is getting better at doing so, so we hope to improve here) there needs to be enough work in a work-item to fill the SIMD lanes, and auto-vectorization is relatively fiddly in any compiler, so it is always more reliable if you use the built-in functions.

You may want to try downloading the ARM DS-5 Community Edition - this supports the Streamline profiling tool which includes support for capturing and displaying the GPU hardware performance counters. This should help you indicate where your GPU cycles are being spent, including some measure of the efficiency of the GPU interaction with main memory. We have an optimization guide which includes some hints on what counters you want to look at for different types of problem, but if you have any questions please shout:

Mali GPU Application Optimization Guide v3.0 « Mali Developer Center

Kind regards,
Pete
Cancel
Vote up 0 Vote down

Cancel
Chris Varnsverry over 11 years ago in reply to mike winter

> I've been working off of a NFS disk to avoid writing to the SD card. Is there any other option to this?
This is the method I use, although USB stick is also viable.
> Since I'm copying portions of the image multiple times into local memory (necessary on a machine with 16k of local memory) this is probably killing performance
This is unnecessary on our architecture, and everything can be done from global memory. The guide Pete links to is focused on graphics I believe but will give some good insight and performance considerations on our architecture. There is also an OpenCL specific guide at Mali-T600 Series GPU OpenCL Developer Guide « Mali Developer Center which is worth a read!
Thanks,
Chris
Cancel
Vote up 0 Vote down

Cancel