I will train a tensorflow or caffe CNN model with Nvidia cuda GPU, and would like to deploy it to an embedded system with arm mali-g71 or g72 GPU to run inference, is this possible without major code modification? Seems like mali GPU supports only openCL ? any solutions? Thanks!