Hello,
in SVE, I can use `svcmpeq_u64` to compare two svuint64_t vectors. The result is a `svbool_t` predicate. Now, I can use `svlastb_` functions to extract the element itself. However, as I am implementing a linear search using SVE, I am not interested in the element itself (I know what I am searching for), but the _index_ of the first true match in the svbool_t returned by the comparison function.
I was not able to find any function that gives me the index of the first true element in a svbool_t. In x86 SSE, for example, I can create a movemask and count trailing zeros to index my comparison results. In NEON, I can do the same by using an emulated movemask function, but I don't know how to do it natively in NEON. How would I do this on SVE?
Thank you so much for your help.
Best,
Maxbit
Dear George,
Thank you so much again for your help and please excuse my late answer. I am currently implementing several variants that you suggested in order to benchmark them. I have one question on the UMINV approach, maybe I am overseeing something.
The comparison result vector will be a vector of either all 1-bit or all 0-bit entries. If I element-wise AND this vector with our {1,2,3,..} index vector, the smallest non-zero number that remains is the index we are looking for. However, the AND will turn indices where there was no match into 0, and as 0 will be smaller than any of our other indices, UMINV will always return 0. Or am I getting something wrong here? We would need an instruction for the smallest non-zero value, wouldn't we?
Thank you for clearing that up.
Hi Maxbit,
You are correct, the UMINV approach doesn't work as I imagined. You could try instead ANDing it with the bitwise negation of the index (which is a large unsigned number) and use UMAXV instead. Something like:
#include <arm_neon.h> #include <stdio.h> int main() { uint32x4_t x = {0, 0, ~0U, ~0U}; uint32x4_t y = {~0U, ~1U, ~2U, ~3U}; uint32_t idx = ~vmaxvq_u32(vandq_u32(x, y)); printf("idx = %u\n", idx); }
Thanks,George