This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Can Arm Mali GPU run tensorflow or caffe deep learning model?

I will train a tensorflow or caffe CNN model with Nvidia cuda GPU, and would like to deploy it to an embedded system with arm mali-g71 or g72 GPU to run inference, is this possible without major code modification? Seems like mali GPU supports only openCL ? any solutions? Thanks!

Parents

marianmi over 7 years ago in reply to Anton Lokhmotov

CNN on Mali is a joke (my experience with Mali T628)

1) there's a tensorflow (v0.11, old) branch that uses coriander to translate CUDA-OpenCL. Works for some stuff, but waay slower than CPU tensorflow (upstream) compiled with some neon compiler flags

2) i tried theano with GPU array backend (open-cl) ... WAY (>100 times) slower than with CPU

3) there's a CaffeOnACL (ARM Compute Library) branch, which supposedly uses NEON, GPU, etc. done by ARM. Another sad joke. Same example (classifying an image from a pre-trained model) was 2 times slower with CaffeOnACL than Caffe mainline branch using CPU

4) Caffe supports OpenCL. Tried that too, caffe detects the GPU and all, but when trying to run something with GPU enabled:

F0923 22:04:38.238814 10416 syncedmem.cpp:256] Check failed: mapped_ptr == cpu_ptr_ (0 vs. 0x7d433000) Device claims it support zero copy but failed to create correct user ptr buffer

So yeah... don't bother...
Cancel
Up -1 Down

Cancel

Reply

marianmi over 7 years ago in reply to Anton Lokhmotov

CNN on Mali is a joke (my experience with Mali T628)

1) there's a tensorflow (v0.11, old) branch that uses coriander to translate CUDA-OpenCL. Works for some stuff, but waay slower than CPU tensorflow (upstream) compiled with some neon compiler flags

2) i tried theano with GPU array backend (open-cl) ... WAY (>100 times) slower than with CPU

3) there's a CaffeOnACL (ARM Compute Library) branch, which supposedly uses NEON, GPU, etc. done by ARM. Another sad joke. Same example (classifying an image from a pre-trained model) was 2 times slower with CaffeOnACL than Caffe mainline branch using CPU

4) Caffe supports OpenCL. Tried that too, caffe detects the GPU and all, but when trying to run something with GPU enabled:

F0923 22:04:38.238814 10416 syncedmem.cpp:256] Check failed: mapped_ptr == cpu_ptr_ (0 vs. 0x7d433000) Device claims it support zero copy but failed to create correct user ptr buffer

So yeah... don't bother...
Cancel
Up -1 Down

Cancel

Children

TSPoon over 6 years ago in reply to marianmi

Hi Marianmi,

Thanks for sharing that info. Do you know the approx dimensions of the cnn model you were using?

I just checked the Mali specs. and there is really minimal power. It may have felt overwhelmed.
Cancel
Up 0 Down

Cancel