I will train a tensorflow or caffe CNN model with Nvidia cuda GPU, and would like to deploy it to an embedded system with arm mali-g71 or g72 GPU to run inference, is this possible without major code modification? Seems like mali GPU supports only openCL ? any solutions? Thanks!
Of course, you can run CNN on Mali!
Caffe has a stable OpenCL branch to which we have recently contributed support for Android. You can see some public benchmarking results on the ARM Mali-T860 GPU here: https://github.com/dividiti/ck-caffe-firefly-rk3399. This is enabled by our CK-Caffe framework: https://github.com/dividiti/ck-caffe.
OpenCL/SYCL support for TensorFlow is tracked here: https://github.com/tensorflow/tensorflow/issues/22 but we haven't been able to test it.
The ARM Compute Library should also become useful at some point: github.com/.../computelibrary
CNN on Mali is a joke (my experience with Mali T628)
1) there's a tensorflow (v0.11, old) branch that uses coriander to translate CUDA-OpenCL. Works for some stuff, but waay slower than CPU tensorflow (upstream) compiled with some neon compiler flags
2) i tried theano with GPU array backend (open-cl) ... WAY (>100 times) slower than with CPU
3) there's a CaffeOnACL (ARM Compute Library) branch, which supposedly uses NEON, GPU, etc. done by ARM. Another sad joke. Same example (classifying an image from a pre-trained model) was 2 times slower with CaffeOnACL than Caffe mainline branch using CPU
4) Caffe supports OpenCL. Tried that too, caffe detects the GPU and all, but when trying to run something with GPU enabled:
F0923 22:04:38.238814 10416 syncedmem.cpp:256] Check failed: mapped_ptr == cpu_ptr_ (0 vs. 0x7d433000) Device claims it support zero copy but failed to create correct user ptr buffer
So yeah... don't bother...
Hi Marianmi,
Thanks for sharing that info. Do you know the approx dimensions of the cnn model you were using?
I just checked the Mali specs. and there is really minimal power. It may have felt overwhelmed.