Tips for Debugging Fortran

March 5, 2016

5 minute read time.

For Fortran and F90 debugging is - like all languages - inevitable. We look at debugging tips for Fortran and F90 developers to show why and how to use a debugger for some typical bugs.

Do it the right way, not the write way

The F90 and Fortran write (or print) statement for debugging is wired into the brain for many developers - but it just doesn’t do the job. Using write or print is iterative.

Insert a write statement before and after the location you think the problem is near.
Compile.
Run.
Did both the write outputs appear?
If yes, narrow the gap between the write statements. If only the first appears, widen the gap again and adjust.
Repeat until the gap is isolated to one line.

This is time consuming - it's hard to understand why it still gets used so often.

Using a debugger finding the line of a Fortran or F90 crash is instantaneous. All debuggers save time, but my favorite debugger is Arm DDT of course.

The debugger will run until the program crashes. At that point - you see the line of the crash, and the source code alongside it.

Debugger crash

With a debugger, the variables relevant to the current location are visible when the process crashes. You can look at the calling frames in the stack too. That lets you answer questions like: why did the program call this function, why is it here and what are the variables for each stack frame. DDT is a graphical debugger - which means it has a graphical interface and that makes it much easier to see all that information than a command line debugger.

In contrast, the write statement - having decided where you need to put it - well, you’ve a lot more work to do if you want to know why it crashed.

Watching the code path beats guessing it

We all have a horror story about a legacy code. The source file as long as a novel. The code with no comments.

The one true way to know what code is doing is to watch it as it progresses.

Debuggers let you step line by line through interesting bits of the Fortran or F90, step into a function or subroutine, or just set a breakpoint and let the code run until it reaches a function or line that you care about. You now know for sure how your program is getting executed.

Step into function or subroutine

And whilst you’re doing that - you will still be seeing the variables that explain why things are happening - scalar integers and floating point, or large arrays - and you can stop as soon as something untoward happens. Again, not something your write statement is set to do.

Pay attention to loop start and end cases

Inconsistent results and intermittent crashes often arise from accessing memory locations that either were not initialized, or worse, are fatal to touch.

When the i’th element is a calculated from the i-1th or the i+1th element - be careful with the first and/or last iteration. Do not try to access elements outside of the array bounds.

Take a typical Fortran stencil code, or more precisely some F90 here. The code looks sensible but it's flawed.

foo.f90
allocate(b(1000,1000))
...
...
do i = 1, 1000
  do j = 1, 1000
      ...
      a(j, i) = ( b(j - 1, i) + b(j + 1, i) + b(j, i - 1) + b(j, i + 1)  + b(j, i) ) * z / 5

It is reading both above and below the end of the allocation. That has bad consequences.

Output is calculated from non-existent and undefined values! Garbage In Garbage Out.
Bad indexes can stray into the next page of memory (a block of usually 4,096 bytes) - and that can cause a segmentation fault. This is down to luck, leading to crashes one time in two, or the harder to fix one time in a thousand.

Arm DDT shows the size of an array - helpful for knowing which indexes are and are not in range. More powerfully, DDT automatically detects these errors for allocatable arrays - both reading and writing. It is faster than typical compiler implemented bounds protection - all that’s needed is to tick a box to enable memory debugging in the DDT user interface.

This video shows us solving this kind of problem using DDT.

Use watchpoints to catch variable changes

Ever had a code that only works when using Fortran compiler A and fails with compiler B? Or that crashes when you add (or remove) something completely harmless like a simple write statement?

Often it a bug that becomes visible because two compilers place variables in a different order in memory: the bug exists already, the compiler does not create it. We saw reading the wrong array indexes had consequences - but worse things happen when writing to elements beyond the end of arrays - because they can overwrite other good values. As two compilers can choose different orders and layouts to memory - one compilation may create a layout where a variable is particularly vulnerable to a stray write, whereas another compiler's compilation may be lucky and unaffected.

You can use DDT’s memory debugging if working with allocatable arrays to protect against stray writes. However, for the more general case, debuggers have neat support for “hardware watchpoints”. These let you track when a change happens to a given memory location, instantly. This uses a hardware feature present in most modern processors that allows a small handful of memory locations to be watched. On change, the processor instantly alerts the operating system.

To add a watchpoint in DDT, just select a variable or memory location and righ click to select "Add Watchpoint".

Add a watchpoint

The magic then happens. The process runs without interruption until the change.

Watchpoint info

Here, we can see that c(1,10) is about to be changed - at line 80 of serial.f90 - and we even see the old and new values.

Debugging is rarely fun - so it's important to take advantage of all the magic tools at your disposal! We've seen four good reasons to use a debugger to do things that cannot be done without one: discovering the exact crash location and context, inspection of actual executed code paths, detection of stray memory accesses and detection of when variables change.

[CTAToken URL = "https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge/ddt" target="_blank" text="Read more about Arm DDT " class ="green"]

High Performance Computing (HPC) blog

Expanding Arm on Arm with the NVIDIA Grace CPU

Tim Thornton

In this blog post, we show how the Arm Neoverse V2-based NVIDIA Grace CPU can run Arm's most performance-critical workloads and allows Arm to operate a consistent environment in-cloud and on-prem.
- November 20, 2024
Arm Performance Libraries 24.10

Chris Goodyer

In this blog post, we review the improvements made to Arm Performance Libraries 24.10.
- November 11, 2024
Optimizing the Pardiso Sparse Linear Solver on Arm Architecture by Panua Technologies: A Performance Comparison with Intel MKL

David Lecomber

This blog post outlines the strategies utilized to enhance Pardiso's performance by leveraging the Arm architecture and presents a comparative study with Intel MKL Pardiso.
- October 22, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Tips for Debugging Fortran

Do it the right way, not the write way

Watching the code path beats guessing it

Pay attention to loop start and end cases

Use watchpoints to catch variable changes

Expanding Arm on Arm with the NVIDIA Grace CPU

Arm Performance Libraries 24.10

Optimizing the Pardiso Sparse Linear Solver on Arm Architecture by Panua Technologies: A Performance Comparison with Intel MKL