Sometimes Hardware Details Matter in ARM Embedded Systems Programming

June 23, 2014

Last week, I received the call for papers for the Embedded World Conference for 2015. The list of topics is a good reminder of how broad the world of embedded systems is. It also reminded me how overloaded the term “embedded" has become. The term may invoke thoughts of a system made for a specific purpose to perform a dedicated function, or visions of invisible processors and software hidden in a product like a car. When I think of embedded, I tend think about the combination of hardware and software and learning how they work together, and the challenge of building and debugging a system running software that interacts with hardware. Some people call this hardware dependent software, firmware, or device drivers. Whatever it is called, it’s always a challenge to construct and debug both hardware and software and find out what the problems are. One of the great things about working at Carbon is the variety of the latest ARM IP combined with a spectrum of different types of software. We commonly work with software ranging from small bare-metal C programs to Linux running on multiple ARM cores. We also work with a mix of cycle accurate models and abstract models.

If you are interested in this area I would encourage you learn as much as possible about the topics below. Amazingly, the most popular programming language is still C, and being able to read assembly language also helps.

Cross Compilers and Debuggers
CPU Register Set
Instruction Pipeline
Cache
Interrupts and Interrupt Handlers
Timers
Co-Processors
Bus Protocols
Performance Monitors

I could write articles about how project X at company Y used Carbon products to optimize system performance or shrink time to market and lived happy ever after, but I prefer to write about what users can learn from virtual prototypes. Finding out new things via hands-on experience is the exciting part of what embedded systems are for me.

Today, I will provide two examples of what working with embedded systems is all about. The first demonstrates why embedded systems programming is different from general purpose C programming because working with hardware requires paying attention to extended details. The second example relates to a question many people at The specified item was not found. are frequently asked, “Why are accurate models important?” Carbon has become the standard for simulation with accurate models of ARM IP, but it’s not always easy to see why or when the additional accuracy makes a difference, especially for software development. Since some software development tasks can be done with abstract models, I will share a situation where accuracy makes a difference. Both of the examples in this article looked perfectly fine on the surface, but didn’t actually work.

GIC-400 Programming Example

Recently, I was working with some software that had been used on an ARM Cortex-A9 system. I ported it to a Cortex-A15 system, and was working on running it on a new system that used the GIC-400 instead of the internal GIC of the A15.

People that have worked with me know I have two rules for system debugging:

Nothing ever works the first time
When things don’t work, guessing is not allowed

When I ran the new system with the external GIC-400, the software failed to start up correctly. One of the challenges in debugging such problems is that the software jumps off to bad places after things don’t work and there is little or no trail of when the software went off the path. Normally, I try to use software breakpoints to close in on the problem. Another technique is to use the Carbon Analyzer to trace bus transactions and software execution to spot a wrong turn. In this particular case I was able to spot an abort and I traced it to a normal looking access to one of the GIC-400 registers.

I was able to find the instruction that was causing the abort. The challenge was that it looked perfectly fine. It was a read of the GIC Distributor Control Register to see if the GIC is enabled. It’s one of the easiest things that could be done, and would be expected to work fine as long as the GIC is present in the system. Here is the source code:

The load instruction which was aborting was the second one in the function, the LDRB:

The puzzling thing was that the instruction looked fine and I was certain I ran this function on other systems containing the Cortex-A9 and Cortex-A15 internal GIC.

After some pondering, I recalled reading that the GIC-400 had some restrictions on access size for specific registers. Sure enough, the aborting instruction was a load byte. It’s not easy to find a clear statement specifying a byte access to this register is bad, but I'm sure it's in the documentation somewhere. I decided it was easier to just re-code the function to create a word access and try again.

There are probably many ways change the code to avoid the byte read, but I tried the function this way since the enable bit is the only bit used in the register:

Sure enough, the compiler now generated a load word instruction and it worked as expected.

This example demonstrates a few principles of embedded systems. The first is the ability to understand ARM assembly language is a big help in debugging, especially tracing loads and stores to hardware such as the GIC-400. Another is that the code a C compiler generates sometimes matters. Most of the time when using C there is no need to look at the generated code, but in this case there is a connection between the C code and how the hardware responds to the generated instructions. Understanding how to modify the C code to generate different instructions was needed to solve the problem.

Mysterious Interrupt Handler

The next example demonstrates another situation where details matter. This was a bare-metal software program installing an interrupt handler for the Cortex-A15 processor for the nIRQ interrupt by putting a jump to the address of the handler at address 0x18. This occurs during program startup by writing an instruction into memory which will jump to the C function (irq_handler) to handle the interrupt. The important code looked like this, VECTOR_BASE is 0:

The code looked perfectly fine and worked when simulated with abstract models, but didn’t work as expected when run on a cycle accurate simulation. Initially, it was very hard to tell why. The simulation would appear to just hang and when the simulation was stopped and it was sitting in weird places that didn’t seem like code that should have been running. Using the instruction and transaction traces it looked like an interrupt was occurring, but the program didn’t go to the interrupt handler as expected. To debug, I first placed a hardware breakpoint on a change on the interrupt signal, then I placed a software breakpoint on address 0x18 so the simulation would stop when the first interrupt occurred. The expected instruction was there, but when I single stepped to the next instruction the PC just advanced one word to address 0x1c, and no jump. Subsequent step commands just incremented the PC. In this case there was no code at any other address except 0x18 so the CPU was executing instructions that were all 0.

This problem was pretty mysterious considering the debugger showed the proper instruction at the right place, but it was as if it wasn’t there at all. Finally, it hit me that the only possible explanation was that the instruction really wasn’t there.

What if the cache line containing address 0x18 was already in the instruction cache when the jump instruction was written by the above code? When the interrupt occurred the PC jumps to 0x18 but would get the value from the instruction cache and never see the new value that had been written.

The solution was to invalidate the cache line after writing the instruction to memory using a system control register instruction with 0x18 in r0:

Although cache details are mostly handled automatically by hardware and cache modelling is not always required for software development, this example shows that sometimes more detailed models are required to fully test software. In hindsight experienced engineers would recognize self-modifying code, and the need to pay attention to caching, but it does demonstrate a situation where using detailed models does matter.

Summary

Although you may never encounter the exact problems described here, they demonstrate typical challenges embedded systems engineers face, and remind us to keep watch for hardware details. These examples also point out another key principle of embedded software, old code lives forever. This often means that while code may have worked on one system, it won’t automatically work on a new system, even if they seem similar. If these examples sound familiar, it might be time to look into virtual prototypes for your embedded software development.

Jason Andrews

poojashree over 6 years ago

This information so useful to me on system programming skills. i'm trainer to provide inplant training and internship to students on that i need to share this info on to my students.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Chris Shore over 10 years ago

Thanks, Jason, for a really interesting post. People sometimes ask us why we continue to teach people about the ARM instruction set in quite some detail on our training courses. Your post illustrates beautifully one of the reasons why we believe it is so important.
Awareness of the memory architecture, particularly behaviour and impact of the caches, is another subject on which we spend considerable time. The Architecture Reference Manual has quite a lot of rules about what are called "context changing operations" after which cache maintenance operations may be required. Understanding all of these and the different maintenance operations required is quite an involved business. For instance, if the data write you used to store in instruction to memory went through either a write buffer or completed in the data cache, you would need to carry out some data side maintenance operations to ensure that it had been pushed out far enough to be reloaded into the instruction cache properly. You have to understand quite a bit about the memory architecture of a particular product to work some of that out.
Looking forward to your next post!
Best wishes
Chris
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

SoC Design and Simulation blog

Understanding Scandump: A key silicon debugging technique

Vincent Yang

Scandump is highly effective in silicon debugging as it can capture most internal states through scan chains, making it invaluable in diagnosing silicon issues.
- June 5, 2024
Introduction to AMBA Viz

Tony Nip

AMBA Viz enables faster debug and performance analysis for cycle-accurate simulation and emulation, even for complex interconnects and AMBA bus protocols.
- May 31, 2024
Arm Virtual Platform co-simulation solution accelerates SoC verification

Daniel Owens

Avery Design Systems’ co-simulation design verification solution that integrates SystemC-based Arm virtual platforms with a SystemVerilog environment.
- December 6, 2022

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Sometimes Hardware Details Matter in ARM Embedded Systems Programming

GIC-400 Programming Example

Mysterious Interrupt Handler

Summary

Understanding Scandump: A key silicon debugging technique

Introduction to AMBA Viz

Arm Virtual Platform co-simulation solution accelerates SoC verification