We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello,
in SVE, I can use `svcmpeq_u64` to compare two svuint64_t vectors. The result is a `svbool_t` predicate. Now, I can use `svlastb_` functions to extract the element itself. However, as I am implementing a linear search using SVE, I am not interested in the element itself (I know what I am searching for), but the _index_ of the first true match in the svbool_t returned by the comparison function.
I was not able to find any function that gives me the index of the first true element in a svbool_t. In x86 SSE, for example, I can create a movemask and count trailing zeros to index my comparison results. In NEON, I can do the same by using an emulated movemask function, but I don't know how to do it natively in NEON. How would I do this on SVE?
Thank you so much for your help.
Best,
Maxbit
Dear George,
Thank you so much again for your help and please excuse my late answer. I am currently implementing several variants that you suggested in order to benchmark them. I have one question on the UMINV approach, maybe I am overseeing something.
The comparison result vector will be a vector of either all 1-bit or all 0-bit entries. If I element-wise AND this vector with our {1,2,3,..} index vector, the smallest non-zero number that remains is the index we are looking for. However, the AND will turn indices where there was no match into 0, and as 0 will be smaller than any of our other indices, UMINV will always return 0. Or am I getting something wrong here? We would need an instruction for the smallest non-zero value, wouldn't we?
Thank you for clearing that up.
Hi Maxbit,
You are correct, the UMINV approach doesn't work as I imagined. You could try instead ANDing it with the bitwise negation of the index (which is a large unsigned number) and use UMAXV instead. Something like:
#include <arm_neon.h> #include <stdio.h> int main() { uint32x4_t x = {0, 0, ~0U, ~0U}; uint32x4_t y = {~0U, ~1U, ~2U, ~3U}; uint32_t idx = ~vmaxvq_u32(vandq_u32(x, y)); printf("idx = %u\n", idx); }
Thanks,George