I will train a tensorflow or caffe CNN model with Nvidia cuda GPU, and would like to deploy it to an embedded system with arm mali-g71 or g72 GPU to run inference, is this possible without major code modification? Seems like mali GPU supports only openCL ? any solutions? Thanks!
Well I had a look on the web and I can't see anything about what I said below, seems my memory failed me. So I believe you'd need an Nvidia Tegra K1 or better if you really want to run on an ARM and use CUDA.
CUDA can be used with GPUs other than nVidia's via openCL. Not done anything like that myself but it is worth doing a bit of Googling on using some other GPU than nVidia's with Tensorflow - it probably isn't too bad.