Exploring SIMD and Java Vector API Performance: Benchmarking Across Multiple Machines

Hi HPC Community,

I’ve recently developed an early automation to testbed for SIMD optimized algorithms utilization in Java. The codebase utilizes the Java Vector API to explore the performance gains and challenges of SIMD operations. My goal is to investigate how Java’s evolving vectorization support can be leveraged for high-performance computing, specifically focusing on all personal, server-grade and commercial machines.

What I've Built:

The automation is designed to run on OpenJDK 17 or later (with preview features enabled). It leverages Maven for dependency management and build automation, and Make for task orchestration, including running benchmarks.

I’m using JMH (Java Microbenchmark Harness) as the primary framework to build, run, and analyze nano/micro/milli/macro benchmarks. JMH’s flexibility allows me to target various aspects of performance for Java and other JVM languages (maybe in the future), ensuring a comprehensive evaluation of SIMD operations using the Java Vector API.

In the GitHub repository, you will find the source code of the testbed and the early algorithms I’ve used for the benchmarks, allowing you to replicate the tests or build upon them. I’ve also included some early results from running the benchmarks on my personal machines (more to be added), alongside a very brief analysis of these findings.

Key Areas of Interest:

  • SIMD Performance on ARM architectures: I’m particularly interested in how the API interacts with ARM processors in an HPC environment. I would love to hear from anyone who has experimented with this or has insights into optimizations for ARM chips.
  • Java in HPC: While Java isn’t always the first choice for HPC applications, I believe the Java Vector API could change that narrative, especially for high-concurrency, data-parallel tasks.
  • Cross-platform Performance: Currently, I’m running tests across a variety of hardware, but I’d like to expand this further, especially on ARM-based hardware.
  • Optimization of Cloud Computing Resource Utilization: One of my goals is to also explore how the Java Vector API can help in optimizing cloud resource usage. With the ever-increasing importance of cloud computing in HPC, I am particularly interested in how SIMD and vectorization can enhance the efficiency of computational workloads in cloud environments, reducing costs and improving performance. Insights into how others have leveraged SIMD to better utilize cloud resources or manage infrastructure costs would be invaluable.

Call for Feedback and Collaboration:

I’m eager to hear your feedback on the performance gains and limitations you've encountered with SIMD and the Java Vector API. Suggestions for additional algorithms, code optimization, or any other improvements are more than welcome.

If anyone in the community has access to more specialized hardware (ARM-based or otherwise) and would be willing to help run additional benchmarks, I’d love to collaborate. Optimizing this across a broader range of machines would offer deeper insights into how this approach could scale for real-world HPC applications.

You can find the code repository here: GitHub - java-vector-api-playground.

Feel free to dive in, run your own tests, and share your thoughts. If you'd like to get in touch directly, my email is contact@....

Looking forward to your thoughts, advice, and collaboration opportunities!

Best regards,

IP