• Arm Allinea Studio 19.2: building on Libraries and Arm Compiler for Linux performance

    Patrick Wohlschlegel
    Patrick Wohlschlegel

    Arm Allinea Studio 19.2 is now available. This new major release includes valuable updates to the Arm Performance Libraries (Arm PL) and the Arm Compiler for Linux. This new version includes our first attempt at the Arm Opt Report and the introduction…

    • 11 months ago
    • High Performance Computing
    • HPC blog
  • Arm Forge 19.1: Introducing "Forge Ultimate" edition and region profiling capabilities

    Patrick Wohlschlegel
    Patrick Wohlschlegel

    Arm Forge 19.1 is now available. This new major version includes the launch of a new Arm Forge Ultimate edition and the introduction of "region profiling", leveraging LLNL's work on Caliper.

    Introduction of Arm Forge Ultimate

    By popular…

    • 11 months ago
    • High Performance Computing
    • HPC blog
  • Detecting Memory Leaks

    Mark O'Connor
    Mark O'Connor

    Memory leaks are a killer of long running applications - memory usage keeps growing until finally the memory supply is exhausted and it's "game over". If you’re lucky the system recognizes your application is at fault and terminates it. If you’re unlucky…

    • over 7 years ago
    • High Performance Computing
    • HPC blog
  • Debugging CUDA Dynamic Parallelism

    David Lecomber
    David Lecomber

    Today, using one of the early examples from the CUDA toolkit, I’m going to introduce a neat feature of CUDA 5 and CUDA 5.5 - dynamic parallelism - and how to use Arm DDT to debug it.

    What is CUDA?

    CUDA brings highly parallel computing into the graphics…

    • over 6 years ago
    • High Performance Computing
    • HPC blog
  • Optimizing Discovar - Part 2: Running in the cloud on Amazon EC2

    Mark O'Connor
    Mark O'Connor

    The Story So Far

    In Part 1 I ran Discovar, a life sciences genome assembly code, on one of our internal systems and optimized it to run the benchmark code 7% faster. Of course, physical hardware often performs very differently to cloud-hosted machines…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Tool Up with Arm DDT!

    Mark O'Connor
    Mark O'Connor

    We humans can survive in almost every environment on our planet and are beginning to step off it. We command fire hotter than the core of a star and freeze atoms at temperatures cooler than the depths of interstellar space. Not bad for squishy sacks of…

    • over 8 years ago
    • High Performance Computing
    • HPC blog
  • Who broke my sparklines?

    David Lecomber
    David Lecomber

    It's January 2012 and I'm sitting on a cross-Atlantic flight. Sweat is beading on my brow and it's nothing to do with the cabin temperature. I am not a happy bunny. I'm a very unhappy bunny and somebody is going to pay.

    On this fateful…

    • over 8 years ago
    • High Performance Computing
    • HPC blog
  • The Instant Fix

    David Lecomber
    David Lecomber

    One of the great things about working at Arm is meeting developers with real problems and improving their lives. When you have a tool that transforms the daily report to the boss from “still fixing that bug” to “developing new code” - you’ve just made…

    • over 8 years ago
    • High Performance Computing
    • HPC blog
  • Detecting I/O contention in HPC code using Arm Forge Pro GPFS metrics

    Chris January
    Chris January

    I/O contention is a frustrating problem to solve. An application run may be taking longer than expected, but how do you know if it’s due to I/O contention?

    Arm Forge Pro includes I/O metrics for Lustre, but not GPFS. Fortunately Forge Pro also includes…

    • over 2 years ago
    • High Performance Computing
    • HPC blog
  • Advanced Memory Debugger and Memory Leak Detection for C++, C and F90 Applications

    Mark O'Connor
    Mark O'Connor

    Advanced Memory Debugger and Memory Leak tool for Linux C++, C and F90

    The memory debugger in Arm DDT assists in fixing a number of common memory usage errors with C, C++ and Fortran codes on Linux. The mode extends massively beyond what can be observed…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • CUDA Debugger and Profiler - Advanced Debugging and Performance Optimization Tools for CUDA and OpenACC

    Mark O'Connor
    Mark O'Connor

    Debugging and Optimizing CUDA and OpenACC

    Arm Forge is a development tool suite for developing, debugging and optimizing CUDA and OpenACC codes - from GeForce to Tesla and the Kepler K80. Forge includes the parallel and multi-process CUDA debugger, Arm…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Writing a MAP Custom Metric: PAPI IPC

    Mark O'Connor
    Mark O'Connor

    New metric

    Arm MAP isn't just a lightweight profiler to help you optimize your code. It also lets you add your own metrics with just a couple of lines of code. To show how this works, I'm going to add PAPI's instructions-per-cycle metric to MAP.

    The…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Arm announces its most comprehensive tool suite for the HPC ecosystem

    David Lecomber
    David Lecomber

    With the continued emergence of innovative, infrastructure-ready Arm-based server platforms, Arm is announcing availability of the Arm Allinea Studio. By providing access to Arm-specific compilers and libraries alongside market-leading debug and optimization…

    • over 2 years ago
    • High Performance Computing
    • HPC blog
  • Arm acquires Allinea: The exciting road ahead

    David Lecomber
    David Lecomber

    It’s with great excitement that we’re announcing that Allinea is now a part of Arm.

    For over 10 years at Allinea we’ve been on an incredible journey to be your cross-platform tools provider for high performance computing (HPC).

    It’s…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 4: Supercomputer vs Pong II

    Mark O'Connor
    Mark O'Connor

    In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Boosting OpenFOAM behavior with Arm Performance Reports

    Florent Lebeau
    Florent Lebeau

    OpenFOAM, developed by ESI-OpenCFD is one of the most popular tools for developing CFD (Computational Fluid Dynamics) applications, along with ANSYS Fluent or CD-Adapco Star-CCM+.

    Most modules of OpenFOAM are heavily optimized and offer little room for…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Tuning bowtie2 for better performance

    Mark O'Connor
    Mark O'Connor

    Faster sequence alignment with Arm Performance Reports

    Recently we've been running bowtie2 on a 16 CPU server with 32 GB RAM. I've tried using the “-p” flag to use more cores but it doesn't seem to make a lot of difference after 8 or so.…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 3: Supercomputer vs Pong

    Mark O'Connor
    Mark O'Connor

    blog image

    I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs that play games is the perfect solution, but…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Profiling OpenMP with Arm MAP 5.0

    Mark O'Connor
    Mark O'Connor

    A whirlwind tour of Arm MAP's new OpenMP profiling capabilities

    We're going to see what Arm MAP 5.0 can do by profiling three versions of a simple PI calculator program with some added I/O for good fun:

    • A serial version
    • An OpenMP version
    • A mixed…
    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • How to debug and profile those mixed Python and Fortran codes

    David Lecomber
    David Lecomber

    Python is pretty commonplace in scientific computing these days. It is easy to code and powerful - but numerical computation is not a strength that Python has. Its interpreter simply can’t apply the advanced optimizations to your loops and floating point…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Profiling and Tuning Linpack: A Step-by-Step Guide

    Mark O'Connor
    Mark O'Connor

    xhpl is compute-boundThis year we're proud to be sponsoring the Student Cluster Competition at SC15. One of the key codes teams will have to optimize for their systems is the classic Linpack benchmark. I decided to have a go on one of our test systems to see what the students…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Fixing Dangling Pointers

    Mark O'Connor
    Mark O'Connor

    What are dangling pointers?

    Dangling pointers are pointers whose memory has been freed but which have not been set to null (or 0x0). This allows a particularly tricky class of bug to arise, because it is often possible for subsequent code to keep on using…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Tips for Debugging Fortran

    David Lecomber
    David Lecomber

    For Fortran and F90 debugging is - like all languages - inevitable. We look at debugging tips for Fortran and F90 developers to show why and how to use a debugger for some typical bugs.

    Do it the right way, not the write way

    The F90 and Fortran write…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 2: Scaling TensorFlow over multiple EC2 GPU nodes

    Mark O'Connor
    Mark O'Connor

    In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Arm MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.

    To get further gains we found areas of the…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 1: Optimizing DeepMind's A3C on Torch

    Mark O'Connor
    Mark O'Connor

    Torch

    In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
>