Can the Neon DSP be used to optimize the AI algorithm? Does ARM provide an open source implementation ?
If I assume that by AI algorithms you mean CNN, the basic building block for that is matrix multiply.
You can find neon based matrix multiply implementations as part of the arm performance libraries, openblas, or blis just to name a few.
You can then use open source software like openCV and link against your favorite blas implementation.
Suggested google terms: openCV , blas
Also, if looking for more generic AI and not just vision, TensorFlow has neon kernels as well.