• Optimizing Discovar - Part 2: Running in the cloud on Amazon EC2

    Mark O'Connor
    Mark O'Connor

    The Story So Far

    In Part 1 I ran Discovar, a life sciences genome assembly code, on one of our internal systems and optimized it to run the benchmark code 7% faster. Of course, physical hardware often performs very differently to cloud-hosted machines…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Detecting I/O contention in HPC code using Arm Forge Pro GPFS metrics

    Chris January
    Chris January

    I/O contention is a frustrating problem to solve. An application run may be taking longer than expected, but how do you know if it’s due to I/O contention?

    Arm Forge Pro includes I/O metrics for Lustre, but not GPFS. Fortunately Forge Pro also includes…

    • over 2 years ago
    • High Performance Computing
    • HPC blog
  • Advanced Memory Debugger and Memory Leak Detection for C++, C and F90 Applications

    Mark O'Connor
    Mark O'Connor

    Advanced Memory Debugger and Memory Leak tool for Linux C++, C and F90

    The memory debugger in Arm DDT assists in fixing a number of common memory usage errors with C, C++ and Fortran codes on Linux. The mode extends massively beyond what can be observed…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • CUDA Debugger and Profiler - Advanced Debugging and Performance Optimization Tools for CUDA and OpenACC

    Mark O'Connor
    Mark O'Connor

    Debugging and Optimizing CUDA and OpenACC

    Arm Forge is a development tool suite for developing, debugging and optimizing CUDA and OpenACC codes - from GeForce to Tesla and the Kepler K80. Forge includes the parallel and multi-process CUDA debugger, Arm…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Writing a MAP Custom Metric: PAPI IPC

    Mark O'Connor
    Mark O'Connor

    New metric

    Arm MAP isn't just a lightweight profiler to help you optimize your code. It also lets you add your own metrics with just a couple of lines of code. To show how this works, I'm going to add PAPI's instructions-per-cycle metric to MAP.

    The…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Arm acquires Allinea: The exciting road ahead

    David Lecomber
    David Lecomber

    It’s with great excitement that we’re announcing that Allinea is now a part of Arm.

    For over 10 years at Allinea we’ve been on an incredible journey to be your cross-platform tools provider for high performance computing (HPC).

    It’s…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 4: Supercomputer vs Pong II

    Mark O'Connor
    Mark O'Connor

    In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 3: Supercomputer vs Pong

    Mark O'Connor
    Mark O'Connor

    blog image

    I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs that play games is the perfect solution, but…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 2: Scaling TensorFlow over multiple EC2 GPU nodes

    Mark O'Connor
    Mark O'Connor

    In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Arm MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.

    To get further gains we found areas of the…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 1: Optimizing DeepMind's A3C on Torch

    Mark O'Connor
    Mark O'Connor

    Torch

    In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Profiling and Tuning Linpack: A Step-by-Step Guide

    Mark O'Connor
    Mark O'Connor

    xhpl is compute-boundThis year we're proud to be sponsoring the Student Cluster Competition at SC15. One of the key codes teams will have to optimize for their systems is the classic Linpack benchmark. I decided to have a go on one of our test systems to see what the students…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Profiling OpenMP with Arm MAP 5.0

    Mark O'Connor
    Mark O'Connor

    A whirlwind tour of Arm MAP's new OpenMP profiling capabilities

    We're going to see what Arm MAP 5.0 can do by profiling three versions of a simple PI calculator program with some added I/O for good fun:

    • A serial version
    • An OpenMP version
    • A mixed…
    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Four simple tips for optimizing your code

    Beau Paisley
    Beau Paisley

    Arm DDT and Arm MAP are excellent tools for finding program flaws and performance issues – they are also very helpful for studying codes and coding techniques. In this article I present a handful of optimization techniques and use Arm MAP to illustrate…

    • over 5 years ago
    • High Performance Computing
    • HPC blog