• CUDA Debugger and Profiler - Advanced Debugging and Performance Optimization Tools for CUDA and OpenACC

    Mark O'Connor
    Mark O'Connor

    Debugging and Optimizing CUDA and OpenACC

    Arm Forge is a development tool suite for developing, debugging and optimizing CUDA and OpenACC codes - from GeForce to Tesla and the Kepler K80. Forge includes the parallel and multi-process CUDA debugger, Arm…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Arm announces its most comprehensive tool suite for the HPC ecosystem

    David Lecomber
    David Lecomber

    With the continued emergence of innovative, infrastructure-ready Arm-based server platforms, Arm is announcing availability of the Arm Allinea Studio. By providing access to Arm-specific compilers and libraries alongside market-leading debug and optimization…

    • over 2 years ago
    • High Performance Computing
    • HPC blog
  • Arm acquires Allinea: The exciting road ahead

    David Lecomber
    David Lecomber

    It’s with great excitement that we’re announcing that Allinea is now a part of Arm.

    For over 10 years at Allinea we’ve been on an incredible journey to be your cross-platform tools provider for high performance computing (HPC).

    It’s…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 4: Supercomputer vs Pong II

    Mark O'Connor
    Mark O'Connor

    In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 2: Scaling TensorFlow over multiple EC2 GPU nodes

    Mark O'Connor
    Mark O'Connor

    In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Arm MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.

    To get further gains we found areas of the…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 1: Optimizing DeepMind's A3C on Torch

    Mark O'Connor
    Mark O'Connor

    Torch

    In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Profiling and Tuning Linpack: A Step-by-Step Guide

    Mark O'Connor
    Mark O'Connor

    xhpl is compute-boundThis year we're proud to be sponsoring the Student Cluster Competition at SC15. One of the key codes teams will have to optimize for their systems is the classic Linpack benchmark. I decided to have a go on one of our test systems to see what the students…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Tuning bowtie2 for better performance

    Mark O'Connor
    Mark O'Connor

    Faster sequence alignment with Arm Performance Reports

    Recently we've been running bowtie2 on a 16 CPU server with 32 GB RAM. I've tried using the “-p” flag to use more cores but it doesn't seem to make a lot of difference after 8 or so.…

    • over 5 years ago
    • High Performance Computing
    • HPC blog
  • Boosting OpenFOAM behavior with Arm Performance Reports

    Florent Lebeau
    Florent Lebeau

    OpenFOAM, developed by ESI-OpenCFD is one of the most popular tools for developing CFD (Computational Fluid Dynamics) applications, along with ANSYS Fluent or CD-Adapco Star-CCM+.

    Most modules of OpenFOAM are heavily optimized and offer little room for…

    • over 5 years ago
    • High Performance Computing
    • HPC blog