We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
First of all please confirm that it is indeed gpu task and not simd. I guess it is for gpu:
I have a vector of length 32 elements, of size 16 bits each.
I need to compare each of those elements to EACH of 32 elements of another 200k vectors like this one (200,000) vectors.
I understood SIMD does it perfectly fine and I guess 200k times every time might be even not that bad to compute.
But it is still awful to execute 200k comparisons in SIMD serially and I want to do it in GPU in parallel.
Q1: Can each gpu core/entity do such comparison like simd does (to compare each of the elements of vector A to a single element of vector B in parallel)?
Q2: Which library (one of the Neon’s I guess) suits me the best? I found the compute library, but I cannot find what to use (I see https://arm-software.github.io/ComputeLibrary/latest/index.xhtml but there is no more straight forward tutorial than this one that I’d found. How to implement such a comparison? Say SIMD has vceq_s32 for example which is straight forward. But how do I work against that link? If this the relevant library for me of course.
Many thanks,
vitali.pom