This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Address Space Identifier - ASID

For ARMv7 -A/R systems, the MMU uses an ASID to distinguish between memory pages which have the same virtual address, but which are used by an individual task ( I.e. A task which uses non-Global memory). The ASID is an eight-bit value, from 0-255, assigned by the Operating System.


So, does that mean, for a normal, e.g. Linux system, there can be no more than 256 non-Global tasks running at one time? Should we assume that a piece of multi-threaded code would have a unique Task/Process Identifier (for scheduling purposes), for each thread, but all its threads would have the same ASID?


And what happens in a Type1 virtual system, with a Hypervisor and say, two Guest OS's? Presumably, each Guest kernel has its own ASID? And which OS is assigning the ASID to user tasks?

Parents
  • Hi Mike,

    Very simply:

    Linux uses a rollover mechanism for ASID - there can only be 256 ASIDs allocated within the system, and it maintains a bitmap of ASIDs currently in use. So what it does is allocate them per task until it runs out. When a task exits, the ASID is invalidated from the branch predictor, caches and TLBs and can be reallocated to another task. If all the ASIDs are in use, it will invalidate the branch predictor and caches and TLBs and start again allocating new ASIDs on a first-come first-serve basis (so tasks that never want to run don't waste ASIDs).

    This works because even on a system with 100,000 "running" tasks, you probably only have 4-8 cores, and only one task can run at any time per core. Until you can throw 256 cores in an ARM system, the current behaviour will work fine (most systems are pretty far from that and, by the time it is a problem, 16-bit ASIDs will be de jure).

    For threading (not just on Linux), since every "process" has an ASID, all "threads" within that process will share the ASID. I quote those because different operating systems call these things different names.

    For virtual machines, there is a VMID - each guest OS has a VMID and each guest OS has tasks with an ASID. A cache lookup requires both to match a translation, so the ASID rollover code won't clobber entries for another guest OS.

    Ta,

    Matt Sealey

Reply
  • Hi Mike,

    Very simply:

    Linux uses a rollover mechanism for ASID - there can only be 256 ASIDs allocated within the system, and it maintains a bitmap of ASIDs currently in use. So what it does is allocate them per task until it runs out. When a task exits, the ASID is invalidated from the branch predictor, caches and TLBs and can be reallocated to another task. If all the ASIDs are in use, it will invalidate the branch predictor and caches and TLBs and start again allocating new ASIDs on a first-come first-serve basis (so tasks that never want to run don't waste ASIDs).

    This works because even on a system with 100,000 "running" tasks, you probably only have 4-8 cores, and only one task can run at any time per core. Until you can throw 256 cores in an ARM system, the current behaviour will work fine (most systems are pretty far from that and, by the time it is a problem, 16-bit ASIDs will be de jure).

    For threading (not just on Linux), since every "process" has an ASID, all "threads" within that process will share the ASID. I quote those because different operating systems call these things different names.

    For virtual machines, there is a VMID - each guest OS has a VMID and each guest OS has tasks with an ASID. A cache lookup requires both to match a translation, so the ASID rollover code won't clobber entries for another guest OS.

    Ta,

    Matt Sealey

Children
  • Thanks, Matt.

    I take your point on The diverse definitions for  tasks and threads. IBM had a definition for these terms, which still applies nowadays:

    A task is the smallest unit of code which can compete independently for system resources


    A thread is an instance of such code, were the code must either be serially re-usable, or completely re-entrant. A thread is therefore a task in its own right. An OS such as Linux will use Mutexes and Spinlocks to enable serially-reusable threads; fully re-entrant code is simply non-modifiable, as with ARM Assembler .text sections.


    The only major change to these definitions is the Intel concept of Hyper Threading, which involves hardware duplication, and which is designed to exploit the delays incurred by reads/writes to RAM (as opposed to external peripherals).The effect of advances in cache technology on the benefits of hyper threading does not appear to be well documented, and would make for interesting reading! As far as I can tell, ARM does not appear to make a distinction between the terms "multi-threading" and "hyper threading" (or can somebody prove me wrong?)