We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi HPC Community,
We have recently used the Vector API to implement bit packing and unpacking of boolean values.
For benchmarking, we've used JMH with JDK 24
VectorMask.fromArray(…).toLong(…)
VectorMask.fromLong(…).intoArray(…)
On inspecting the assembly with the HotSpot disassembler, we noticed that SVE instructions such as STR (predicate): Store predicate register and LDR (predicate): Load predicate register, which match well with this use case, are not being generated. Instead, the current implementation relies on shifts, rotations, and bitwise operations.
With this post, we’d like to explore opportunities for improving the performance of VectorMask operations on Arm by leveraging direct predicate instructions (STR/LDR) rather than bitwise operations.
We have gone through a prior post on Vector API (Exploring SIMD and Java Vector API Performance), looking forward to insights and possible collaboration opportunities to enhance Arm performance.
Regards, Chiranmoy
Hi Chiranmoy,
Thanks for the report. We're planning to reproduce this on the JDK mainline and we'll share findings as we learn more.
If you could share the benchmark you've used or another minimal reproducer test case, that would help.