I am new to ARM64 assembly and intrinsics. I have a small routine that uses SSE4.1 x86_64 intrinsics for a vector dot product. I am trying to (as close as possible) replace the x86_64 intrinsics with ARM64 intrinsics. I believe with the ARM64 I will be using single precision rather than double precision and there will be a slightly different results. However, I am trying to get as close as possible. I do have access to arm neon. Intrinsic instructions or asm would work. I am currently stuck. Thanks.
ARM
float32x4_t a, b;
__m128 a, b;
????
result = _mm_dp_pd(a, b, mask);
If you are porting floating-point based intrinsics, you may wish to look at the OpenVec suite which supports both x86_64 and ARM NEON. https://community.arm.com/community-help/f/discussions/11104/port-x86_64-intrinsics-to-arm64-equivalent
The URL to be linked to was: https://github.com/OpenVec/OpenVec