One of the great things about working at Arm is meeting developers with real problems and improving their lives. When you have a tool that transforms the daily report to the boss from “still fixing that bug” to “developing new code” - you’ve just made…
I/O contention is a frustrating problem to solve. An application run may be taking longer than expected, but how do you know if it’s due to I/O contention?
Arm Forge Pro includes I/O metrics for Lustre, but not GPFS. Fortunately Forge Pro also includes…
The memory debugger in Arm DDT assists in fixing a number of common memory usage errors with C, C++ and Fortran codes on Linux. The mode extends massively beyond what can be observed…
Arm Forge is a development tool suite for developing, debugging and optimizing CUDA and OpenACC codes - from GeForce to Tesla and the Kepler K80. Forge includes the parallel and multi-process CUDA debugger, Arm…
Arm MAP isn't just a lightweight profiler to help you optimize your code. It also lets you add your own metrics with just a couple of lines of code. To show how this works, I'm going to add PAPI's instructions-per-cycle metric to MAP.
The…
With the continued emergence of innovative, infrastructure-ready Arm-based server platforms, Arm is announcing availability of the Arm Allinea Studio. By providing access to Arm-specific compilers and libraries alongside market-leading debug and optimization…
It’s with great excitement that we’re announcing that Allinea is now a part of Arm.
For over 10 years at Allinea we’ve been on an incredible journey to be your cross-platform tools provider for high performance computing (HPC).
It’s…
In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of…
I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs that play games is the perfect solution, but…
In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Arm MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.
To get further gains we found areas of the…
In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that…
For Fortran and F90 debugging is - like all languages - inevitable. We look at debugging tips for Fortran and F90 developers to show why and how to use a debugger for some typical bugs.
The F90 and Fortran write…
This year we're proud to be sponsoring the Student Cluster Competition at SC15. One of the key codes teams will have to optimize for their systems is the classic Linpack benchmark. I decided to have a go on one of our test systems to see what the students…
We're going to see what Arm MAP 5.0 can do by profiling three versions of a simple PI calculator program with some added I/O for good fun:
Python is pretty commonplace in scientific computing these days. It is easy to code and powerful - but numerical computation is not a strength that Python has. Its interpreter simply can’t apply the advanced optimizations to your loops and floating point…
Dangling pointers are pointers whose memory has been freed but which have not been set to null (or 0x0). This allows a particularly tricky class of bug to arise, because it is often possible for subsequent code to keep on using…
Recently we've been running bowtie2 on a 16 CPU server with 32 GB RAM. I've tried using the “-p” flag to use more cores but it doesn't seem to make a lot of difference after 8 or so.…
OpenFOAM, developed by ESI-OpenCFD is one of the most popular tools for developing CFD (Computational Fluid Dynamics) applications, along with ANSYS Fluent or CD-Adapco Star-CCM+.
Most modules of OpenFOAM are heavily optimized and offer little room for…
Arm DDT and Arm MAP are excellent tools for finding program flaws and performance issues – they are also very helpful for studying codes and coding techniques. In this article I present a handful of optimization techniques and use Arm MAP to illustrate…
In recent months, the HPC market has been waiting to see how Arm will drive innovation for High Performance Computing. Arm and its partners have been working hard to enable a greater variety of competitive hardware solutions, providing the innovation…
If you’ve ever been in Denver, you might have noticed incredibly different one’s perspective can be, depending on which direction one chooses. Looking eastward, you might assume the Denver…