I will train a tensorflow or caffe CNN model with Nvidia cuda GPU, and would like to deploy it to an embedded system with arm mali-g71 or g72 GPU to run inference, is this possible without major code modification? Seems like mali GPU supports only openCL ? any solutions? Thanks!
I realise this question was asked some time ago and you may have moved on, but have you had a look at these blogs from 3 months ago?
https://community.arm.com/processors/b/blog/posts/running-alexnet-on-raspberry-pi-with-compute-library
and
https://community.arm.com/tools/b/blog/posts/profiling-alexnet-on-raspberry-pi-and-hikey-960-with-the-compute-library
I followed the first one to install and run the Alexnet CNN on the Odroid XU4 with Arm Compute Library and the second one helped me to get StreamLine community edition working from a remote PC so that I could monitor the GPU activity. May be worth a look!