• Deep Learning Episode 2: Scaling TensorFlow over multiple EC2 GPU nodes

    Mark O'Connor
    Mark O'Connor

    In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Arm MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.

    To get further gains we found areas of the…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 3: Supercomputer vs Pong

    Mark O'Connor
    Mark O'Connor

    blog image

    I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs that play games is the perfect solution, but…

    • over 3 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 1: Optimizing DeepMind's A3C on Torch

    Mark O'Connor
    Mark O'Connor

    Torch

    In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that…

    • over 4 years ago
    • High Performance Computing
    • HPC blog
  • Deep Learning Episode 4: Supercomputer vs Pong II

    Mark O'Connor
    Mark O'Connor

    In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of…

    • over 3 years ago
    • High Performance Computing
    • HPC blog