hi,
I do not know if using NPU could be interesting for what i would like to do.
So i explain my need.
1) i want to to compare matrice 64*64 with mask of data for comparaison.
2) until now i use CPU and SIMD to do this like a simple double loop on array [64][64][number of form to compare = 64]
3) using GPU i could do the same but in this case i will need to and i flag for each form because gpu i random processing. So i do not think it will be faster. I try and it is not. but i may be wrong with the way i implemnted the kernel and organized the data.
4) If i where using NPU will i get better performance ?
I readed many things about NPU and i anderstand that it can be faster and using less energy. But it is used for CNN model and as i anderstoud, some calculation are faster because they integrated ALU unit of calculation. But i do not need all that staff, i do it in another way than CNN.
So in my case will NPU would be usefull ?
thans in advance.
It's hard to give a precise answer - different accelerators will have different offload overheads and different performance for the workloads submitted to them (either because of accelerator hardware differences or workload differences), so both sides of the "cost vs benefit" balance are platform-dependent variables.
For GPGPU work my gut feel answer would be something like "it's worth considering if it's at least a millisecond of work" - for a high-end Mali that's 12M shader core cycles, each core doing 128 fp32 FMAs per clock, so "quite large".