Hello Comunity
We are trying to run an example code on the Arm Fast Models FVP (an Arm720 model). It is very simple assembly code example exercising the Arm Memory Tagging Extension (MTE).
The code is supposed to generate a "Synchronus MTE ABORT" exception due to a memory access using the wrong tag but this never happens.Possibly we are missing some register field setting that is relevant:
The assembler code is as follows:TestTarmacTrace (2).log
1 0x1000 TLBI ALLE3 // (TLBI ALLE3)2 0x1004 DSB SY // (DSB)3 0x1008 ISB // (ISB)4 0x100c ERET // Trap to EL25 0x1020 ERET // Trap to EL1 6 0x1030 ERET // Trap to EL0 Load the tag from address 0x80000500 in a register to confirm it has been set7 0x1040 IRG X8, X3[, X8] // Generate a random Memory Tag 8 0x1044 STG X8, [X8], #0 // Store the tag into address 810045009 0x1048 DSB SY // DSB10 0x104c LDG x9,[x8] // Load the stored tag into a register - note that it renders a 0 in the register11 0x1050 STR x2,[x5,#0] // Store data in 81004500 with a different tag - supposed to generate a sychronous tag check abort exception here
The code is executed in EL0 in non secure mode.Attached is also the Tarmac Trace from the A720 FVP
Before we execute the code snippet we set with debug writes the following registers and memory and also program the MMU for 2 stage translation with 3 levels each as seen in the tarmac trace:
write_register("MAIR_EL3", 0xf0f0f0f0f0f0f0f0) #In EL3 all accesses are with Tagged Memory Attributewrite_registerwrite_register("MAIR_EL2", 0xf0f0f0f0f0f0f0f0) #In EL2 all accesses are with Tagged Memory Attributewrite_registerwrite_register("MAIR_EL1", 0xf0f0f0f0f0f0f0f0) #In EL1 all accesses are with Tagged Memory Attributewrite_registerwrite_register("SCR_EL3", 0x4000431) # ATA = 1 bit 26 allocation tag access enabled
# EL3 configuration# EL3 ERET returns to EL2 exception handler start addresswrite_register("ELR_EL3", 0x1020)# Set the program state for this register to a known valuewrite_register("SPSR_EL3", 0x209)
# EL2 configuration DCT bit 57 0b1; ATA bit 56 0b1write_register("HCR_EL2", "{} | 0x300000080000000".format(read_register("HCR_EL2")))# EL2 ERET returns to EL1 exception handler start addresswrite_register("ELR_EL2", 0x1030)# Set the program state for EL2 to a known valuewrite_register("SPSR_EL2", 0x205)
# EL1 configurationwrite_register("ELR_EL1", 0x1040)# Set the program state for this register to a known valuewrite_register("SPSR_EL1", 0x200)
# Base address for EL3, EL2, EL1 exceptionsexception_el3_base = 0xe000write_register("VBAR_EL3", exception_el3_base)write_register("VBAR_EL2", exception_el3_base)write_register("VBAR_EL1", exception_el3_base)
# Base address for the translation tablestable_base = 0x80000000write_register("TTBR0_EL3", table_base) # EL3 and EL2 only do a 1 stage translation so use a different tablewrite_register("TTBR0_EL2", table_base)table_s1_base = 0x80040000write_register("TTBR0_EL1", table_s1_base)table_s2_base = 0x80080000write_register("VTTBR_EL2", table_s2_base)
# Enable EL2 by setting the non-secure bit in SCR_EL3#write_register("SCR_EL3", 0x1)
# Enable Stage 2 translationwrite_register("HCR_EL2", "{} | 0x1".format(read_register("HCR_EL2")))
# Set the TCR as:# - top-byte ignored - bit 20 = b1# - physical size 256 TB (48-bits) - bits 18:16 = b101# - TTBR0 granule size 4KB - bits 15:14 = b00# - Non-shareable - bits 13:12 = b00# - Outer/Inner non-cacheable - bits 11:10 and 9:8 = b00# - VTCR only SL0 - starting level for stage 2 translation is 1 - bits 7:6 # - T0SZ 16 (so 48 bit virtual size) 256 TB - bits 5:0 = 0x10write_register("TCR_EL3", 0x80950010)write_register("TCR_EL2", 0x80950010)write_register("VTCR_EL2", 0x80050058) # stage 2 translation# - top-byte ignored TBI1 and TBI0 - bit 38, 37 = b11 # - Intermediate Physical Address Size - 48 bits - 256 TB - bits 34:32 = b101# - TG1 granule size 4KB - bits 31:30 = b10# - SH1/SH0 Non-shareable - bits 29:28 and 13:12 b00# - ORGN1/IRGN1 Outer/Inner non-cacheable - bits 27:26 and 25:24 b0000# - EPD1 - use TTBR1_EL1 for walks bit 23 =b0 # - T1SZ 16 (so 64 - 16 = 48 bit virtual address size) 256 TB - bits 21:16 = 0x10# - TG0 granule size 4KB - bits 15:14 = 00# - ORGN0/IRGN0 Outer/Inner non-cacheable - bits 11:10 and 9:8 b0000# - TTBR0 granule size 64KB - bits 15:14 = b01# - Non-shareable - bits 13:12 = b00# - Outer/Inner non-cacheable - bits 11:10 and 9:8 = b0000# - T0SZ 16 (so 64 - 16 = 48 bit virtual address size) 256 TB - bits 5:0 = 0x10write_register("TCR_EL1", 0x6580100010)
# L0 tablefor entry in range(512): if entry == 0: descriptor = 0x0000000000000443 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_base + 0x10000) write_memory(table_base + (entry * 8), descriptor, pack="<Q")# L1 tablefor entry in range(512): if entry == 0: descriptor = 0x0000000000000443 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_base + 0x20000) write_memory(table_base + 0x10000 + (entry * 8), descriptor, pack="<Q")# L2 tablefor entry in range(512): if entry == 0: descriptor = 0x0000000000000443 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_base + 0x30000) write_memory(table_base + 0x20000 + (entry * 8), descriptor, pack="<Q")# L3 tablefor entry in range(512): if entry in range(2): descriptor = 0x0000000000000403 # Valid first and second block entry for the instruction address else: descriptor = 0x0000000881000443 # Valid subsequent entries descriptor |= (0x1000 * entry) write_memory(table_base + 0x30000 + (entry * 8), descriptor, pack="<Q")
# Program the MMU for four levels (level 0-3) of table with 512 entries with a 4KB granule size# The exact settings are:# - Not secure in next table level - bit 63 = b0# - Not execute never/privileged execute never (XN/PXN = 0) - bit 59,58 = b00# - Contiguous - bit 52 = b0# - Dirty bit (=0)# - Not global - bit 11 nG= b0# - Access flag (AF=1, to prevent the first time Access fault) - bit 10 = b1# - Non-shareable (SH=00) - bit 9:8 = b00# - Access permissions read/write (AP=01) bit 7:6# - Non-secure (NS=0) - bit 5 = b0# - Memory attributes TBD (=000 for now) bits 4:2 = b000# - Table descriptor type valid (=11) - bit 0 = b1# Stage 1 L0 tablefor entry in range(512): if entry == 0: descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_s1_base + 0x10000) write_memory(table_s1_base + (entry * 8), descriptor, pack="<Q")# Stage 1 L1 tablefor entry in range(512): if entry == 0: descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_s1_base + 0x20000) write_memory(table_s1_base + 0x10000 + (entry * 8), descriptor, pack="<Q")# Stage 1 L2 tablefor entry in range(512): if entry == 0: descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address else: descriptor = 0x0000000000000443 # Valid subsequent entries descriptor |= (table_s1_base + 0x30000) write_memory(table_s1_base + 0x20000 + (entry * 8), descriptor, pack="<Q")# Stage 1 L3 tablefor entry in range(512): if entry in range(2): descriptor = 0x00000000000004c3 # Valid first and second block entry for the instruction address else: descriptor = 0x0000000800000443 # Valid subsequent entries descriptor |= (0x1000 * entry) write_memory(table_s1_base + 0x30000 + (entry * 8), descriptor, pack="<Q") # Program the MMU for stage 2 for three levels (level 0-3) of table with 512 entries with a 4KB granule size# Stage 2 L1 tablefor entry in range(512): if entry == 0: descriptor = 0x0000000000000403 # Valid first block entry instructions address (execute permissions) else: descriptor = 0x0000000000000443 # Valid subsequent entries (read/write permissions) descriptor |= (table_s2_base + 0x10000) write_memory(table_s2_base + (entry * 8), descriptor, pack="<Q")# Stage 2 L2 tablefor entry in range(512): if entry == 0: descriptor = 0x00000000000004c3 # Valid first block entry instructions address (execute permissions) else: descriptor = 0x00000000000004c3 # Valid subsequent entries (read/write permissions) descriptor |= (table_s2_base + 0x20000) write_memory(table_s2_base + 0x10000 + (entry * 8), descriptor, pack="<Q")# Stage 2 L3 tablefor entry in range(512): if entry in range(2): descriptor = 0x00000000000004c3 # Valid blocks entry instructions address (execute permissions) - used by executing instruction translation elif entry in range(5): descriptor = 0x00000088810004c3 # Valid subsequent entries (read/write permissions) - used for loads elif entry in range(128): descriptor = 0x00000000800004c3 # Valid blocks entry instructions address (execute permissions) - used by table descriptor translation, point back to the S1 tables else: descriptor = 0x00000088810004c3 # Valid subsequent entries (read/write permissions) - used for loads descriptor |= (0x1000 * entry) write_memory(table_s2_base + 0x20000 + (entry * 8), descriptor, pack="<Q")
# Just dump the first 8 descriptors of each tableprint_memory(table_base, 64)print_memory(table_base + 0x10000, 64)print_memory(table_base + 0x20000, 64)print_memory(table_base + 0x30000, 64)print_memory(table_s1_base, 64)print_memory(table_s1_base + 0x10000, 64)print_memory(table_s1_base + 0x20000, 64)print_memory(table_s1_base + 0x30000, 64)print_memory(table_s2_base, 64)print_memory(table_s2_base + 0x10000, 64)print_memory(table_s2_base + 0x20000, 64)
# Write a value to the source registerwrite_register("X2", 0xdeadbeeffacefeed)write_register("X4", 0xbadecadefade9876)
# Load a value to the address register, which should be in MMU entry 5store_address = 0x0100000081004500write_register("X3", store_address)write_register("X5", 0x0700000081004500)
# Write values to memory using what will be the physical address that the store will try to overwriteone_stage_physical_address = store_address + 0x800000000 # Used for loads from EL3 and EL2two_stage_physical_address = store_address + 0x8800000000 # Used for loads from EL1 and EL0write_memory(one_stage_physical_address, 0x1122334455667788, pack="<Q")write_memory(one_stage_physical_address + 8, 0xf0e1d2c3b4a59687, pack="<Q")write_memory(two_stage_physical_address + 16, 0x0123456789abcdef, pack="<Q")write_memory(two_stage_physical_address + 24, 0xffeeddccbbaa9988, pack="<Q")write_memory(two_stage_physical_address + 32, 0x00ff11ee22dd33cc, pack="<Q")
# Assert that the memory was updatedprint_memory(one_stage_physical_address, 16)print_memory(two_stage_physical_address + 16, 24)
# Turn on the MMU bit 0 # For all exception levels ATA bit 43 eq 0b1; ATA0 bit 42 (EL2, EL1) 0b1 TCF bits 41,40 synch fault - 0b01; and tag checking TCF0 bits 38,39 (EL2-EL0) 0b01; ITF bit 37 0b1 write_register("SCTLR_EL3", "{} | 0x92000000001".format(read_register("SCTLR_EL3")))write_register("SCTLR_EL2", "{} | 0xD6000000001".format(read_register("SCTLR_EL2")))write_register("SCTLR_EL1", "{} | 0xD6000000001".format(read_register("SCTLR_EL1")))write_register("RGSR_EL1", 0x100) #bit 8 to 11 SEED set to 0b1write_register("GCR_EL1", 0x10000) # bit 16 tag generation best algorithm
We run the FVP with the following command - "/work/FVP_ARM_Std_Library/FVP_Base/FVP_Base_Cortex-A720 -C bp.secure_memory=false -C bp.dram_size=400 -C cluster0.NUM_CORES=0x1 -C bp.dram_metadata.is_enabled=1 --iris-connect tcpserver -p" # Linux command
Tarmac traces are anbled with the environment variable set to : "FM_TRACE_PLUGINS=|trace-file=./trace.log||trace_memory=true|/work/FVP_ARM_Std_Library/plugins/Linux64_GCC-9.3/TarmacTrace.so"
So essentially my question is what register setting is missing here in order to generate the synchronous tag check abort exception in line 11 of the code.
Your help would be very much appreciated. And thank you very much in advance.
# My apologies - The default value of the cluster0.memory_tagging_support_level in FVP_A720 is 3 so please ignore my second suggestion below..
This isn't a direct answer to your question but I have two suggestions.
1. Create an AXF for the code sequence that you perform Iris writes to core's system registers and memory.
Setting stateless registers (e.g. general purpose register R0/X0 etc) via Iris/CADI should be okay but I suspect that setting system registers via Iris/CADI are not okay as some of them might have their internal states held inside the model. Without advancing the simulation time or without letting the core execute instructions, these registers won't update their internal states so these registers should be updated by the core model *executing* instructions.
2. Specify "-C cluster0.memory_tagging_support_level=2" to enable FEAT_MTE. See the Fast Models Reference Guide below for further information about this parameter.
developer.arm.com/.../ARMCortexA720CT
I hope this helps.
Hopefully, I now have more useful information below. Speaking with my coworker on the TarmacTrace log, we've noticed that:
1. Below looks wrong because the memory type is Device. It should be Normal so there must be something wrong in MMU configuration.
80000 ps TLB FILL FVP_Base_Cortex_A720.cluster0.cpu0.UTLB 4K 0x81004000_NS EL1_ns vmid=0:0x8881004000_NS Device-nGnRnE (StronglyOrdered) xn=0 pxn=0 ContiguousHint=0 xs=1 Assured=0 AMEC=0
2. HCR_EL2.DC doesn't seem to be set so it should.
I don't know if there is anything else wrong so these may not be sufficient.
I've gotten another suggestion from another coworker - TC2 FVP (that includes A720 as a subcluster) uses CI-700 with MTE support.
Though the TC2 FVP uses a heterogeneous cluster (A520+A720+X4), referring to a working example is the best approach, I believe.
Thank you so much for your help Oishi San,
I will first try to fix up my MMU configuration so it doesn't try to access device memory but normal memory instead as you sugested. And also configure HCR_EL2.DC as you suggested. If that does not work I will use the TC2 FVP as youy suggested and hopefully those two experiments will progress and fix my issueswhich are tormenting me for a while:) I will let you know the outcome and once again I am so greatful for your help Mate!
Cheers,Tony