Arm FVP A720 MTE test code example does not generate tag mismatch exception.

Hello Comunity

We are trying to run an example code on the Arm Fast Models FVP (an Arm720 model).
It is very simple assembly code example exercising the Arm Memory Tagging Extension (MTE).

The code is supposed to generate a "Synchronus MTE ABORT" exception due to a memory access using the wrong tag but this never happens.
Possibly we are missing some register field setting that is relevant:

The assembler code is as follows:TestTarmacTrace (2).log

1 0x1000 TLBI ALLE3 // (TLBI ALLE3)
2 0x1004 DSB SY // (DSB)
3 0x1008 ISB // (ISB)
4 0x100c ERET // Trap to EL2
5 0x1020 ERET // Trap to EL1
6 0x1030 ERET // Trap to EL0 Load the tag from address 0x80000500 in a register to confirm it has been set
7 0x1040 IRG X8, X3[, X8] // Generate a random Memory Tag
8 0x1044 STG X8, [X8], #0 // Store the tag into address 81004500
9 0x1048 DSB SY // DSB
10 0x104c LDG x9,[x8] // Load the stored tag into a register - note that it renders a 0 in the register
11 0x1050 STR x2,[x5,#0] // Store data in 81004500 with a different tag - supposed to generate a sychronous tag check abort exception here

The code is executed in EL0 in non secure mode.
Attached is also the Tarmac Trace from the A720 FVP

Before we execute the code snippet we set with debug writes the following registers and memory and
also program the MMU for 2 stage translation with 3 levels each as seen in the tarmac trace:

write_register("MAIR_EL3", 0xf0f0f0f0f0f0f0f0) #In EL3 all accesses are with Tagged Memory Attributewrite_register
write_register("MAIR_EL2", 0xf0f0f0f0f0f0f0f0) #In EL2 all accesses are with Tagged Memory Attributewrite_register
write_register("MAIR_EL1", 0xf0f0f0f0f0f0f0f0) #In EL1 all accesses are with Tagged Memory Attributewrite_register
write_register("SCR_EL3", 0x4000431) # ATA = 1 bit 26 allocation tag access enabled

# EL3 configuration
# EL3 ERET returns to EL2 exception handler start address
write_register("ELR_EL3", 0x1020)
# Set the program state for this register to a known value
write_register("SPSR_EL3", 0x209)

# EL2 configuration DCT bit 57 0b1; ATA bit 56 0b1
write_register("HCR_EL2", "{} | 0x300000080000000".format(read_register("HCR_EL2")))
# EL2 ERET returns to EL1 exception handler start address
write_register("ELR_EL2", 0x1030)
# Set the program state for EL2 to a known value
write_register("SPSR_EL2", 0x205)

# EL1 configuration
write_register("ELR_EL1", 0x1040)
# Set the program state for this register to a known value
write_register("SPSR_EL1", 0x200)

# Base address for EL3, EL2, EL1 exceptions
exception_el3_base = 0xe000
write_register("VBAR_EL3", exception_el3_base)
write_register("VBAR_EL2", exception_el3_base)
write_register("VBAR_EL1", exception_el3_base)

# Base address for the translation tables
table_base = 0x80000000
write_register("TTBR0_EL3", table_base) # EL3 and EL2 only do a 1 stage translation so use a different table
write_register("TTBR0_EL2", table_base)
table_s1_base = 0x80040000
write_register("TTBR0_EL1", table_s1_base)
table_s2_base = 0x80080000
write_register("VTTBR_EL2", table_s2_base)

# EL3 configuration
# EL3 ERET returns to EL2 exception handler start address
write_register("ELR_EL3", 0x1020)
# Set the program state for this register to a known value
write_register("SPSR_EL3", 0x209)

# EL2 configuration DCT bit 57 0b1; ATA bit 56 0b1
write_register("HCR_EL2", "{} | 0x300000080000000".format(read_register("HCR_EL2")))
# EL2 ERET returns to EL1 exception handler start address
write_register("ELR_EL2", 0x1030)
# Set the program state for EL2 to a known value
write_register("SPSR_EL2", 0x205)

# EL1 configuration
write_register("ELR_EL1", 0x1040)
# Set the program state for this register to a known value
write_register("SPSR_EL1", 0x200)

# Base address for EL3, EL2, EL1 exceptions
exception_el3_base = 0xe000
write_register("VBAR_EL3", exception_el3_base)
write_register("VBAR_EL2", exception_el3_base)
write_register("VBAR_EL1", exception_el3_base)

# Base address for the translation tables
table_base = 0x80000000
write_register("TTBR0_EL3", table_base) # EL3 and EL2 only do a 1 stage translation so use a different table
write_register("TTBR0_EL2", table_base)
table_s1_base = 0x80040000
write_register("TTBR0_EL1", table_s1_base)
table_s2_base = 0x80080000
write_register("VTTBR_EL2", table_s2_base)

# Enable EL2 by setting the non-secure bit in SCR_EL3
#write_register("SCR_EL3", 0x1)

# Enable Stage 2 translation
write_register("HCR_EL2", "{} | 0x1".format(read_register("HCR_EL2")))

# Set the TCR as:
# - top-byte ignored - bit 20 = b1
# - physical size 256 TB (48-bits) - bits 18:16 = b101
# - TTBR0 granule size 4KB - bits 15:14 = b00
# - Non-shareable - bits 13:12 = b00
# - Outer/Inner non-cacheable - bits 11:10 and 9:8 = b00
# - VTCR only SL0 - starting level for stage 2 translation is 1 - bits 7:6
# - T0SZ 16 (so 48 bit virtual size) 256 TB - bits 5:0 = 0x10
write_register("TCR_EL3", 0x80950010)
write_register("TCR_EL2", 0x80950010)
write_register("VTCR_EL2", 0x80050058) # stage 2 translation
# - top-byte ignored TBI1 and TBI0 - bit 38, 37 = b11
# - Intermediate Physical Address Size - 48 bits - 256 TB - bits 34:32 = b101
# - TG1 granule size 4KB - bits 31:30 = b10
# - SH1/SH0 Non-shareable - bits 29:28 and 13:12 b00
# - ORGN1/IRGN1 Outer/Inner non-cacheable - bits 27:26 and 25:24 b0000
# - EPD1 - use TTBR1_EL1 for walks bit 23 =b0
# - T1SZ 16 (so 64 - 16 = 48 bit virtual address size) 256 TB - bits 21:16 = 0x10
# - TG0 granule size 4KB - bits 15:14 = 00
# - ORGN0/IRGN0 Outer/Inner non-cacheable - bits 11:10 and 9:8 b0000
# - TTBR0 granule size 64KB - bits 15:14 = b01
# - Non-shareable - bits 13:12 = b00
# - Outer/Inner non-cacheable - bits 11:10 and 9:8 = b0000
# - T0SZ 16 (so 64 - 16 = 48 bit virtual address size) 256 TB - bits 5:0 = 0x10
write_register("TCR_EL1", 0x6580100010)

# L0 table
for entry in range(512):
if entry == 0:
descriptor = 0x0000000000000443 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_base + 0x10000)
write_memory(table_base + (entry * 8), descriptor, pack="<Q")
# L1 table
for entry in range(512):
if entry == 0:
descriptor = 0x0000000000000443 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_base + 0x20000)
write_memory(table_base + 0x10000 + (entry * 8), descriptor, pack="<Q")
# L2 table
for entry in range(512):
if entry == 0:
descriptor = 0x0000000000000443 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_base + 0x30000)
write_memory(table_base + 0x20000 + (entry * 8), descriptor, pack="<Q")
# L3 table
for entry in range(512):
if entry in range(2):
descriptor = 0x0000000000000403 # Valid first and second block entry for the instruction address
else:
descriptor = 0x0000000881000443 # Valid subsequent entries
descriptor |= (0x1000 * entry)
write_memory(table_base + 0x30000 + (entry * 8), descriptor, pack="<Q")

# Program the MMU for four levels (level 0-3) of table with 512 entries with a 4KB granule size
# The exact settings are:
# - Not secure in next table level - bit 63 = b0
# - Not execute never/privileged execute never (XN/PXN = 0) - bit 59,58 = b00
# - Contiguous - bit 52 = b0
# - Dirty bit (=0)
# - Not global - bit 11 nG= b0
# - Access flag (AF=1, to prevent the first time Access fault) - bit 10 = b1
# - Non-shareable (SH=00) - bit 9:8 = b00
# - Access permissions read/write (AP=01) bit 7:6
# - Non-secure (NS=0) - bit 5 = b0
# - Memory attributes TBD (=000 for now) bits 4:2 = b000
# - Table descriptor type valid (=11) - bit 0 = b1
# Stage 1 L0 table
for entry in range(512):
if entry == 0:
descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_s1_base + 0x10000)
write_memory(table_s1_base + (entry * 8), descriptor, pack="<Q")
# Stage 1 L1 table
for entry in range(512):
if entry == 0:
descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_s1_base + 0x20000)
write_memory(table_s1_base + 0x10000 + (entry * 8), descriptor, pack="<Q")
# Stage 1 L2 table
for entry in range(512):
if entry == 0:
descriptor = 0x00000000000004c3 # Valid first block entry for the instruction address
else:
descriptor = 0x0000000000000443 # Valid subsequent entries
descriptor |= (table_s1_base + 0x30000)
write_memory(table_s1_base + 0x20000 + (entry * 8), descriptor, pack="<Q")
# Stage 1 L3 table
for entry in range(512):
if entry in range(2):
descriptor = 0x00000000000004c3 # Valid first and second block entry for the instruction address
else:
descriptor = 0x0000000800000443 # Valid subsequent entries
descriptor |= (0x1000 * entry)
write_memory(table_s1_base + 0x30000 + (entry * 8), descriptor, pack="<Q")

# Program the MMU for stage 2 for three levels (level 0-3) of table with 512 entries with a 4KB granule size
# Stage 2 L1 table
for entry in range(512):
if entry == 0:
descriptor = 0x0000000000000403 # Valid first block entry instructions address (execute permissions)
else:
descriptor = 0x0000000000000443 # Valid subsequent entries (read/write permissions)
descriptor |= (table_s2_base + 0x10000)
write_memory(table_s2_base + (entry * 8), descriptor, pack="<Q")
# Stage 2 L2 table
for entry in range(512):
if entry == 0:
descriptor = 0x00000000000004c3 # Valid first block entry instructions address (execute permissions)
else:
descriptor = 0x00000000000004c3 # Valid subsequent entries (read/write permissions)
descriptor |= (table_s2_base + 0x20000)
write_memory(table_s2_base + 0x10000 + (entry * 8), descriptor, pack="<Q")
# Stage 2 L3 table
for entry in range(512):
if entry in range(2):
descriptor = 0x00000000000004c3 # Valid blocks entry instructions address (execute permissions) - used by executing instruction translation
elif entry in range(5):
descriptor = 0x00000088810004c3 # Valid subsequent entries (read/write permissions) - used for loads
elif entry in range(128):
descriptor = 0x00000000800004c3 # Valid blocks entry instructions address (execute permissions) - used by table descriptor translation, point back to the S1 tables
else:
descriptor = 0x00000088810004c3 # Valid subsequent entries (read/write permissions) - used for loads
descriptor |= (0x1000 * entry)
write_memory(table_s2_base + 0x20000 + (entry * 8), descriptor, pack="<Q")

# Just dump the first 8 descriptors of each table
print_memory(table_base, 64)
print_memory(table_base + 0x10000, 64)
print_memory(table_base + 0x20000, 64)
print_memory(table_base + 0x30000, 64)
print_memory(table_s1_base, 64)
print_memory(table_s1_base + 0x10000, 64)
print_memory(table_s1_base + 0x20000, 64)
print_memory(table_s1_base + 0x30000, 64)
print_memory(table_s2_base, 64)
print_memory(table_s2_base + 0x10000, 64)
print_memory(table_s2_base + 0x20000, 64)

# Write a value to the source register
write_register("X2", 0xdeadbeeffacefeed)
write_register("X4", 0xbadecadefade9876)

# Load a value to the address register, which should be in MMU entry 5
store_address = 0x0100000081004500
write_register("X3", store_address)
write_register("X5", 0x0700000081004500)

# Write values to memory using what will be the physical address that the store will try to overwrite
one_stage_physical_address = store_address + 0x800000000 # Used for loads from EL3 and EL2
two_stage_physical_address = store_address + 0x8800000000 # Used for loads from EL1 and EL0
write_memory(one_stage_physical_address, 0x1122334455667788, pack="<Q")
write_memory(one_stage_physical_address + 8, 0xf0e1d2c3b4a59687, pack="<Q")
write_memory(two_stage_physical_address + 16, 0x0123456789abcdef, pack="<Q")
write_memory(two_stage_physical_address + 24, 0xffeeddccbbaa9988, pack="<Q")
write_memory(two_stage_physical_address + 32, 0x00ff11ee22dd33cc, pack="<Q")

# Assert that the memory was updated
print_memory(one_stage_physical_address, 16)
print_memory(two_stage_physical_address + 16, 24)


# Turn on the MMU bit 0
# For all exception levels ATA bit 43 eq 0b1; ATA0 bit 42 (EL2, EL1) 0b1 TCF bits 41,40 synch fault - 0b01; and tag checking TCF0 bits 38,39 (EL2-EL0) 0b01; ITF bit 37 0b1
write_register("SCTLR_EL3", "{} | 0x92000000001".format(read_register("SCTLR_EL3")))
write_register("SCTLR_EL2", "{} | 0xD6000000001".format(read_register("SCTLR_EL2")))
write_register("SCTLR_EL1", "{} | 0xD6000000001".format(read_register("SCTLR_EL1")))
write_register("RGSR_EL1", 0x100) #bit 8 to 11 SEED set to 0b1
write_register("GCR_EL1", 0x10000) # bit 16 tag generation best algorithm


We run the FVP with the following command - "/work/FVP_ARM_Std_Library/FVP_Base/FVP_Base_Cortex-A720 -C bp.secure_memory=false -C bp.dram_size=400 -C cluster0.NUM_CORES=0x1 -C bp.dram_metadata.is_enabled=1 --iris-connect tcpserver -p" # Linux command

Tarmac traces are anbled with the environment variable set to : "FM_TRACE_PLUGINS=|trace-file=./trace.log||trace_memory=true|/work/FVP_ARM_Std_Library/plugins/Linux64_GCC-9.3/TarmacTrace.so"

So essentially my question is what register setting is missing here in order to generate the synchronous tag check abort exception in line 11 of the code.

Your help would be very much appreciated. And thank you very much in advance.

Parents
  • Thank you so much for your help Oishi San,

    I will first try to fix up my MMU configuration so it doesn't try to access device memory but normal memory instead as you sugested. And also configure HCR_EL2.DC as you suggested. If that does not work I will use the TC2 FVP as youy suggested and hopefully those two experiments will progress and fix my issueswhich are tormenting me for a while:) I will let you know the outcome and once again I am so greatful for your help Mate!

    Cheers,
    Tony

Reply
  • Thank you so much for your help Oishi San,

    I will first try to fix up my MMU configuration so it doesn't try to access device memory but normal memory instead as you sugested. And also configure HCR_EL2.DC as you suggested. If that does not work I will use the TC2 FVP as youy suggested and hopefully those two experiments will progress and fix my issueswhich are tormenting me for a while:) I will let you know the outcome and once again I am so greatful for your help Mate!

    Cheers,
    Tony

Children
No data