This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bare Metal Input/Output - Documentation?

Does anyone know of an Idiot's Guide to this topic? In particular, how does a processor with no special I/O instructions issue a request, e.g. to a serial output device to output "Hello, World"? And how does Memory-Mapped I/O work in detail? Where is the Memory Map stored? What are the actual contents of Device Memory? And where do AMBA/AXI (or ACE) fit in? (I did say it had to be an Idiot's Guide!)

Parents
  • The simple mental model (which is actually quite close to the truth on many modern chips) is that of a computer network:  the cpu sends a "read n bytes" or "write this data" packet to some address, based on which the chip's infrastructure routes the packet to its destination.  If genuine memory resides at that address, then such packets will do exactly what you'd expect them to do: a read packet makes the memory controller fetch the requested data from RAM and send it back in a reply packet, a write packet will modify RAM and optionally (if requested) reply with a confirmation.

    If the address however gets routed to some peripheral such as an UART, then pretty much anything can happen. The peripheral just receives the "read" and "write" requests as if they are method calls, and is in no way obligated to behave even remotely similar to memory.  Nevertheless, quite often at least part of the peripheral behaves like a piece of memory, containing configuration settings you can modify, or status information that the peripheral provides.  In some cases you can pretend the peripheral is just another process with which you interact via shared memory, although, since this "memory" is in fact embedded in the peripheral it knows when you write to it and can react to it instantly, and likewise it can provide data on-demand when read you it.

    Quite often however there will also be at least a few registers which behave wildly un-memory-like.  An UART will typically have a magic address where its data I/O "register" resides.  Writing a byte to this location will in fact queue the byte into the transmit FIFO, while reading a byte from that same location will dequeue a byte from the receive FIFO and return that as the byte "read".  Note that this violates even the most basic properties expected from memory, so both compiler and CPU need to be warned about that. C has the "volatile" qualifier for that purpose, by which you tell the compiler "when I read or write here, just do exactly what I asked, don't try to do any clever optimizations since there are magical things going on here that you just don't understand". On ARM processors, the OS will mark such regions as "device type" for exactly the same reason, to prevent the CPU from doing caching, coalescing, or other optimizations that rely on "normal memory-like behaviour".

    So, assuming you have first checked that the transmit FIFO has enough space to accept 13 bytes (or just don't care if characters get lost if you send too much too fast):

    volatile struct Uart *uart = ...;
    for( const char *p = "Hello, world!"; *p != 0; p++ )
         uart->data = *p;
    
    

    where struct Uart represents the various "registers" available within the address space of the uart.  Sometimes the 'volatile' might not be applied to the whole thing but just individual fields (or even individual accesses), but it needs to be involved somehow, otherwise the compiler will just think you're a loony for writing to the same field over and over again and optimize the whole thing to just

    uart->data = '!';
    
    

    I've omitted how one obtains the pointer to the uart, since this will be very situation-dependent: the physical address may be hardcoded (especially in embedded systems), provided by the bootloader, or perhaps auto-detected in some way, and (unless you don't use any) the OS will normally need to be involved to map it at some virtual address with suitable attributes before software (i.e. the uart driver) can access it.

    (BTW, I should warn you that in my experience UARTs tend to have very confusing, crufty interfaces as a result of carrying a couple of decades of historical baggage.  I would not recommend studying one if one is new to the concept of memory-mapped I/O.)

    What are the actual contents of Device Memory?

    So, there isn't specifically such "Device Memory".  Devices/peripherals just sit next to memory in a single address space, and while peripherals typically have some memory at least in the form of configuration registers, that's entirely internal to that peripheral.  Some "registers" may not have or represent any memory at all but perform some instant measurement when you read from them, or execute a command if you write to them.

    Where is the Memory Map stored?

    Well, hopefully in the technical reference manual of the device ;-)

    And where do AMBA/AXI (or ACE) fit in?

    AMBA is a family of protocols for creating such "networks" as I described at the beginning of this post, offering various trade-offs between simplicity and performance.  AXI is on the "performance" end of the scale, while "APB" goes for simplicity.  ACE is not really related to all this, it lets CPUs in a multicore system collaborate to create the illusion of having a single shared cache rather than private caches (which otherwise would make inter-process communication via shared memory rather problematic).  Memory-mapped I/O is normally never marked cacheable since you want your writes/reads to actually reach the peripheral and not sit in / be served from some cache.

Reply
  • The simple mental model (which is actually quite close to the truth on many modern chips) is that of a computer network:  the cpu sends a "read n bytes" or "write this data" packet to some address, based on which the chip's infrastructure routes the packet to its destination.  If genuine memory resides at that address, then such packets will do exactly what you'd expect them to do: a read packet makes the memory controller fetch the requested data from RAM and send it back in a reply packet, a write packet will modify RAM and optionally (if requested) reply with a confirmation.

    If the address however gets routed to some peripheral such as an UART, then pretty much anything can happen. The peripheral just receives the "read" and "write" requests as if they are method calls, and is in no way obligated to behave even remotely similar to memory.  Nevertheless, quite often at least part of the peripheral behaves like a piece of memory, containing configuration settings you can modify, or status information that the peripheral provides.  In some cases you can pretend the peripheral is just another process with which you interact via shared memory, although, since this "memory" is in fact embedded in the peripheral it knows when you write to it and can react to it instantly, and likewise it can provide data on-demand when read you it.

    Quite often however there will also be at least a few registers which behave wildly un-memory-like.  An UART will typically have a magic address where its data I/O "register" resides.  Writing a byte to this location will in fact queue the byte into the transmit FIFO, while reading a byte from that same location will dequeue a byte from the receive FIFO and return that as the byte "read".  Note that this violates even the most basic properties expected from memory, so both compiler and CPU need to be warned about that. C has the "volatile" qualifier for that purpose, by which you tell the compiler "when I read or write here, just do exactly what I asked, don't try to do any clever optimizations since there are magical things going on here that you just don't understand". On ARM processors, the OS will mark such regions as "device type" for exactly the same reason, to prevent the CPU from doing caching, coalescing, or other optimizations that rely on "normal memory-like behaviour".

    So, assuming you have first checked that the transmit FIFO has enough space to accept 13 bytes (or just don't care if characters get lost if you send too much too fast):

    volatile struct Uart *uart = ...;
    for( const char *p = "Hello, world!"; *p != 0; p++ )
         uart->data = *p;
    
    

    where struct Uart represents the various "registers" available within the address space of the uart.  Sometimes the 'volatile' might not be applied to the whole thing but just individual fields (or even individual accesses), but it needs to be involved somehow, otherwise the compiler will just think you're a loony for writing to the same field over and over again and optimize the whole thing to just

    uart->data = '!';
    
    

    I've omitted how one obtains the pointer to the uart, since this will be very situation-dependent: the physical address may be hardcoded (especially in embedded systems), provided by the bootloader, or perhaps auto-detected in some way, and (unless you don't use any) the OS will normally need to be involved to map it at some virtual address with suitable attributes before software (i.e. the uart driver) can access it.

    (BTW, I should warn you that in my experience UARTs tend to have very confusing, crufty interfaces as a result of carrying a couple of decades of historical baggage.  I would not recommend studying one if one is new to the concept of memory-mapped I/O.)

    What are the actual contents of Device Memory?

    So, there isn't specifically such "Device Memory".  Devices/peripherals just sit next to memory in a single address space, and while peripherals typically have some memory at least in the form of configuration registers, that's entirely internal to that peripheral.  Some "registers" may not have or represent any memory at all but perform some instant measurement when you read from them, or execute a command if you write to them.

    Where is the Memory Map stored?

    Well, hopefully in the technical reference manual of the device ;-)

    And where do AMBA/AXI (or ACE) fit in?

    AMBA is a family of protocols for creating such "networks" as I described at the beginning of this post, offering various trade-offs between simplicity and performance.  AXI is on the "performance" end of the scale, while "APB" goes for simplicity.  ACE is not really related to all this, it lets CPUs in a multicore system collaborate to create the illusion of having a single shared cache rather than private caches (which otherwise would make inter-process communication via shared memory rather problematic).  Memory-mapped I/O is normally never marked cacheable since you want your writes/reads to actually reach the peripheral and not sit in / be served from some cache.

Children
  • Thank you, matthijs, for your all-conquering reply! I think we have a major opportunity here! Chris is right, there are examples of the use of MM I/O on other technologies MIPS, M8051, IA32, and even Apple IIe! Problem is, I don't know much about their respective Assembly languages. And, I'm bound to ask, if those guys can all do it, why can't ARM? Most ARM TRM's and Architecture manuals have a section, under "Memory Model", a section on Memory Types and Attributes. This would be an ideal place for a 2-3 page discussion, with ARM examples, on MM I/O, ARM-style. How about it, chaps?