Hello,
While testing various features of the GICv3 ITS, I came across some behavior regarding the ITS retry mechanism and wanted to ask a couple of questions.
After a command causes the ITS to stall, I observed that writing 1 to the GITS_CWRITER.retry bit causes the GITS_CREADR value to catch up with GITS_CWRITER.
1
GITS_CWRITER.retry
GITS_CREADR
GITS_CWRITER
In my setup (kernel 5.10.y) with dummy device, the ITS always stalls at the MAPD command during MSI setup.
MAPD
As a result, the system fails to complete the MAPD and MAPTI commands during boot.
MAPTI
However, after issuing a retry, GITS_CREADR catches up with GITS_CWRITER, and when I trigger the MSI (via an INT command), the interrupt handler is correctly called.
INT
I have two questions:
Why does the ITS always stall at the MAPD command? (Could this be because I'm testing with a dummy device?)
After retrying the stalled command, everything appears to work normally — is it safe to assume that the commands completed successfully without issues?
Thank you in advance!
steve jeong said:Why does the ITS always stall at the MAPD command? (Could this be because I'm testing with a dummy device?)
Interesting, looking at the MAPD command it's not obvious which of the listed command errors it would be.
My suspicion is that it's not the command exactly. Rather, are you certain that the writes to the command queue are actually visible to the ITS? If not, the ITS could be picking up whatever random values were in memory and it happens (luckily) that those result in a command error. That could fit with your description, as the later retry simply gives more time for the writes to become visible. I'm not a Linux expert and don't know anything about how the Linux ITS driver functions, but I'd be checking how the PE has the command queue mapped (Device nE vs Device E vs Normal) and what cache ops/barriers in place after the command is written.
steve jeong said:After retrying the stalled command, everything appears to work normally — is it safe to assume that the commands completed successfully without issues?
If the INT command is working as expected, then it does seem to be correct. But in your place, I'd want to understand why the command error was there.
When I was performing the retry, I had the command queue region open in Trace32. I’ll try the retry again without viewing the command queue to see if that changes anything.
This issue doesn’t occur on kernel 6.12. I’ll also check if there are any differences in cache operations between kernel versions. Thank you!