• Deep Learning Episode 4: Supercomputer vs Pong II
    In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the...
  • Deep Learning Episode 3: Supercomputer vs Pong
    I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs...
  • Deep Learning Episode 1: Optimizing DeepMind's A3C on Torch
    In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning...
  • Optimizing Discovar - Part 2: Running in the cloud on Amazon EC2
    The Story So Far In Part 1 I ran Discovar, a life sciences genome assembly code, on one of our internal systems and optimized it to run the benchmark code 7% faster. Of course, physical hardware often...
  • Profiling and Tuning Linpack: A Step-by-Step Guide
    This year we're proud to be sponsoring the Student Cluster Competition at SC15. One of the key codes teams will have to optimize for their systems is the classic Linpack benchmark. I decided to have a...