Does anyone know of an Idiot's Guide to this topic? In particular, how does a processor with no special I/O instructions issue a request, e.g. to a serial output device to output "Hello, World"? And how does Memory-Mapped I/O work in detail? Where is the Memory Map stored? What are the actual contents of Device Memory? And where do AMBA/AXI (or ACE) fit in? (I did say it had to be an Idiot's Guide!)
I like the idea very much.
We have quite a few employees, like myself, who have been STEM ambassadors. I have usually done drawings on a whiteboard, showing the path the information is following.
When you say a serial output device, do you consider that the device has a module dedicated to that (UART/RS232 type) or should the explanation just use a normal I/O to emulate what is usually done in a dedicated module?
Here is on the right an interestingly simple diagram which locates the on-chip bus. You can click on the image to see it bigger.
Chris, would you happen to have written (or know of) such document about processor basics within ARM?
The simpler the example, the better, Alban; if that means normal I/O, fine. Although, UART is in itself an interesting example, having Synchronous versus Asynchronous versions (the latter being akin to what we used to call "Start/Stop Comms". And, unfortunately, I didn't receive your diagram.
I am not aware of any simple documentation within ARM on this topic - mainly because it is not ARM-specific in anyway. Memory-mapped IO works in exactly the same way on almost any architecture or platform you care to mention.
It really works the way it sounds - control/status registers in the device are mapped (in the hardware, by mechanisms that you don't need to understand!) to memory addresses. You access the registers simply by accessing memory locations. Unless you are designing chips (or working with an extremely complex system), you don't really need to factor in any knowledge of AXI/AHB or the bus topology. The contents and behaviour of device memory is defined by the device which is mapped to those particular memory locations.
If you look for example code on pretty much any of hundreds of development boards available from the major silicon vendors, I am sure you will find simple examples of drivers for memory-mapped devices which will show you how it all works.
Hope this helps.
Chris
Chris Shore wrote: Unless you are designing chips (or working with an extremely complex system), you don't really need to factor in any knowledge of AXI/AHB or the bus topology.
Chris Shore wrote:
Unless you are designing chips (or working with an extremely complex system), you don't really need to factor in any knowledge of AXI/AHB or the bus topology.
Well, knowing the topology and the characteristics of the various protocols can obviously also be quite relevant for performance optimization. And, on non-coherent multicore SoCs (and there are plenty of those) knowing the topology can actually be essential for reasoning about ordering of transactions (and therefore of software correctness). For example, on the one I'm using, if a core writes data to location A, performs a full data sync barrier, then writes &A to B, I'm pretty sure another core which reads B and dereferences it to read A is not always guaranteed to read the new data there (when A and B are in different memories).
Thank you, matthijs, for your all-conquering reply! I think we have a major opportunity here! Chris is right, there are examples of the use of MM I/O on other technologies MIPS, M8051, IA32, and even Apple IIe! Problem is, I don't know much about their respective Assembly languages. And, I'm bound to ask, if those guys can all do it, why can't ARM? Most ARM TRM's and Architecture manuals have a section, under "Memory Model", a section on Memory Types and Attributes. This would be an ideal place for a 2-3 page discussion, with ARM examples, on MM I/O, ARM-style. How about it, chaps?
The simple mental model (which is actually quite close to the truth on many modern chips) is that of a computer network: the cpu sends a "read n bytes" or "write this data" packet to some address, based on which the chip's infrastructure routes the packet to its destination. If genuine memory resides at that address, then such packets will do exactly what you'd expect them to do: a read packet makes the memory controller fetch the requested data from RAM and send it back in a reply packet, a write packet will modify RAM and optionally (if requested) reply with a confirmation.
If the address however gets routed to some peripheral such as an UART, then pretty much anything can happen. The peripheral just receives the "read" and "write" requests as if they are method calls, and is in no way obligated to behave even remotely similar to memory. Nevertheless, quite often at least part of the peripheral behaves like a piece of memory, containing configuration settings you can modify, or status information that the peripheral provides. In some cases you can pretend the peripheral is just another process with which you interact via shared memory, although, since this "memory" is in fact embedded in the peripheral it knows when you write to it and can react to it instantly, and likewise it can provide data on-demand when read you it.
Quite often however there will also be at least a few registers which behave wildly un-memory-like. An UART will typically have a magic address where its data I/O "register" resides. Writing a byte to this location will in fact queue the byte into the transmit FIFO, while reading a byte from that same location will dequeue a byte from the receive FIFO and return that as the byte "read". Note that this violates even the most basic properties expected from memory, so both compiler and CPU need to be warned about that. C has the "volatile" qualifier for that purpose, by which you tell the compiler "when I read or write here, just do exactly what I asked, don't try to do any clever optimizations since there are magical things going on here that you just don't understand". On ARM processors, the OS will mark such regions as "device type" for exactly the same reason, to prevent the CPU from doing caching, coalescing, or other optimizations that rely on "normal memory-like behaviour".
So, assuming you have first checked that the transmit FIFO has enough space to accept 13 bytes (or just don't care if characters get lost if you send too much too fast):
volatile struct Uart *uart = ...; for( const char *p = "Hello, world!"; *p != 0; p++ ) uart->data = *p;
where struct Uart represents the various "registers" available within the address space of the uart. Sometimes the 'volatile' might not be applied to the whole thing but just individual fields (or even individual accesses), but it needs to be involved somehow, otherwise the compiler will just think you're a loony for writing to the same field over and over again and optimize the whole thing to just
uart->data = '!';
I've omitted how one obtains the pointer to the uart, since this will be very situation-dependent: the physical address may be hardcoded (especially in embedded systems), provided by the bootloader, or perhaps auto-detected in some way, and (unless you don't use any) the OS will normally need to be involved to map it at some virtual address with suitable attributes before software (i.e. the uart driver) can access it.
(BTW, I should warn you that in my experience UARTs tend to have very confusing, crufty interfaces as a result of carrying a couple of decades of historical baggage. I would not recommend studying one if one is new to the concept of memory-mapped I/O.)
What are the actual contents of Device Memory?
So, there isn't specifically such "Device Memory". Devices/peripherals just sit next to memory in a single address space, and while peripherals typically have some memory at least in the form of configuration registers, that's entirely internal to that peripheral. Some "registers" may not have or represent any memory at all but perform some instant measurement when you read from them, or execute a command if you write to them.
Where is the Memory Map stored?
Well, hopefully in the technical reference manual of the device ;-)
And where do AMBA/AXI (or ACE) fit in?
AMBA is a family of protocols for creating such "networks" as I described at the beginning of this post, offering various trade-offs between simplicity and performance. AXI is on the "performance" end of the scale, while "APB" goes for simplicity. ACE is not really related to all this, it lets CPUs in a multicore system collaborate to create the illusion of having a single shared cache rather than private caches (which otherwise would make inter-process communication via shared memory rather problematic). Memory-mapped I/O is normally never marked cacheable since you want your writes/reads to actually reach the peripheral and not sit in / be served from some cache.