Who broke my sparklines?

February 26, 2012

3 minute read time.

It's January 2012 and I'm sitting on a cross-Atlantic flight. Sweat is beading on my brow and it's nothing to do with the cabin temperature. I am not a happy bunny. I'm a very unhappy bunny and somebody is going to pay.

On this fateful day I'm on my way to Chicago to run an Arm DDT training seminar at NCSA. Having exhausted the list of second-rate in-flight movies, I've started working through our training material in preparation.

It's all been going swimmingly, when suddenly I spot a problem - a major problem. It looks like there's a bug in our just-hit-the-website 3.1 release. The one I'm going to demo at a hands-on workshop. Tomorrow!

The problem is this: in 3.1 we added a fancy new feature called sparklines, which draws a tiny graph next to each variable in the interface, comparing its value across all the processes, instantly. Normally this is really useful, but today it looked... wrong:

The graphs are all corrupt! The graph next to my_rank should be a nice diagonal line, showing that process 0 has a rank of 0 and process 9 has a rank of 9! And p is the size of the job, that should be the same across all processes, but there's some kind of peak in it!

Somebody has broken the build. And tomorrow I'm going to be running a hands-on training session with it. Definitely. Not. Happy.

My first instinct is to raise a positively incandescent bug report. I draft one that starts with "WHO BROKE MY #$@! SPARKLINES?!?!!11", but there's no in-flight WiFi so submitting it has to wait. Instead, I anxiously poke around in the interface to find out how bad the damage is.

The first thing I do is hover my mouse over the sparkline to see the range of values reported:

Ok, so there's clearly some junk in there. 1126236160 is definitely not a valid process rank.

That raises the question as to what the values all actually are, so I click once on the sparkline, which brings up a quick cross-process comparison dialog that shows me the actual values across every process:

That's odd, why would just three processes have the same random value? Suddenly, this doesn't feel quite like a problem with Arm DDT any more. I right-click and make a group out of the three processes with the incorrect value and it all drops into place:

I'm not looking at a bug in Arm DDT at all - I'm looking at a bug in the training program. All three of these processes are merrily looping around and around overwriting memory. The type of the tables array is shown underneath the variables list - it's just a 12 by 12. Yet these processes are already writing to tables[0][112623621] and beyond! They've trashed the stack, including my_rank, p and a whole lot of other variables. It's a small miracle the program hasn't crashed yet!

I look back at the training material. Oh, yes, there we are. Exercise 1: why does the program crash or loop indefinitely when run with 10 processes?

Glancing around to see if anybody has noticed, I delete the outraged bug report from my drafts folder and insert a note into the training material:

"An excellent use of sparklines is spotting memory corruption, even with data on the stack or when memory debugging is turned off."

I glance back at the screen and somewhat grudgingly accept that it's actually pretty cool. The relief is palpable, but I still need a drink. Stewardess!

High Performance Computing (HPC) blog

Expanding Arm on Arm with the NVIDIA Grace CPU

Tim Thornton

In this blog post, we show how the Arm Neoverse V2-based NVIDIA Grace CPU can run Arm's most performance-critical workloads and allows Arm to operate a consistent environment in-cloud and on-prem.
- November 20, 2024
Arm Performance Libraries 24.10

Chris Goodyer

In this blog post, we review the improvements made to Arm Performance Libraries 24.10.
- November 11, 2024
Optimizing the Pardiso Sparse Linear Solver on Arm Architecture by Panua Technologies: A Performance Comparison with Intel MKL

David Lecomber

This blog outlines the strategies utilized to enhance Pardiso's performance by leveraging the Arm architecture and presents a comparative study with Intel MKL Pardiso.
- October 22, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Who broke my sparklines?

Expanding Arm on Arm with the NVIDIA Grace CPU

Arm Performance Libraries 24.10

Optimizing the Pardiso Sparse Linear Solver on Arm Architecture by Panua Technologies: A Performance Comparison with Intel MKL