I am new to ARM64 assembly and intrinsics. I have a small routine that uses SSE4.1 x86_64 intrinsics for a vector dot product. I am trying to (as close as possible) replace the x86_64 intrinsics with ARM64 intrinsics. I believe with the ARM64 I will be using single precision rather than double precision and there will be a slightly different results. However, I am trying to get as close as possible. I do have access to arm neon. Intrinsic instructions or asm would work. I am currently stuck. Thanks.
ARM
float32x4_t a, b;
__m128 a, b;
????
result = _mm_dp_pd(a, b, mask);
The URL to be linked to was: https://github.com/OpenVec/OpenVec