hi,
I do not know if using NPU could be interesting for what i would like to do.
So i explain my need.
1) i want to to compare matrice 64*64 with mask of data for comparaison.
2) until now i use CPU and SIMD to do this like a simple double loop on array [64][64][number of form to compare = 64]
3) using GPU i could do the same but in this case i will need to and i flag for each form because gpu i random processing. So i do not think it will be faster. I try and it is not. but i may be wrong with the way i implemnted the kernel and organized the data.
4) If i where using NPU will i get better performance ?
I readed many things about NPU and i anderstand that it can be faster and using less energy. But it is used for CNN model and as i anderstoud, some calculation are faster because they integrated ALU unit of calculation. But i do not need all that staff, i do it in another way than CNN.
So in my case will NPU would be usefull ?
thans in advance.
Offloading to any accelerator (GPU, NPU, etc) has an overhead, so it only tends to be beneficial for large workloads where the cost of offload is recovered by faster performance of the batch processing. A 64x64 matrix is quite small, so I would be surprised if it would benefit from offloading because the setup cost will dominate the performance.
Arm SME on the CPU might be a good fit for this size of small matrix, as it would be faster than NEON/SVE without a high setup cost. There is a blog on it here: developer.arm.com/.../arm-scalable-matrix-extension-introduction
HTH,Pete