I am currently working on enabling SMMU-v3 in hypervisor, I notice in SMMU-v3, there are several memory attribute configuration options.
a. TABLE_SH for Table access Shareability
b. TABLE_OC for Table access Outer Cacheability
c. TABLE_IC for Table access Inner Cancheability.
same configurations for queue access( QUEUE_SH, QUEUE_OC, QUEUE_IC)
/* CR1 (table and queue memory attributes) */
reg = (CR1_SH_ISH << CR1_TABLE_SH_SHIFT) |
(CR1_CACHE_WB << CR1_TABLE_OC_SHIFT) |
(CR1_CACHE_WB << CR1_TABLE_IC_SHIFT) |
(CR1_SH_ISH << CR1_QUEUE_SH_SHIFT) |
(CR1_CACHE_WB << CR1_QUEUE_OC_SHIFT) |
(CR1_CACHE_WB << CR1_QUEUE_IC_SHIFT);
writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
always configure the register with the dedicated value without thinking about SMMU_IDR0 COHACC, bit ( Coherent access supported to translations, structures and queues.), if there is one SMMU with COHACC as 0, do not support Coherent access, it can only use non-cacheable memory, but the cr1 always configured with cacheablity, does the SMMU works properly?
a. [169:168] S2IR0 for Inner region Cacheability for stage 2 translation table access.
b. [171:170] S2OR0 for Outer region Cacheability for stage 2 translation table access.
c. [173:172] S2SH0 for Shareability for stage 2 translation table access
It does sound as if the problem is down to memory coherency problems.
Have you double checked the attributes used on the CPU for the memory? One test would be to see whether adding a Clean to the PoC (followed by a DSB) for the memory housing the queue/ST/etc.
thanks for your reply.
I checked the attributes used on the cpu for the memory, the normal non-cacheable mair value(0x44) and the right index set to pte.
After I add an cache operation, SMMU works, and the upstream device can access memory normally. All the listed tests have the same result on real silicon chip and fvp revc platform. details as
base on the test above, it looks like the SMMU waiting for the CPU cache operation before it do the real memory access(access PTE or wirte event queue), and SMMU uses the cpu cache operation as the synchronizing signal.
Doing cache operation to non-cacheable memory looks wired. Can you share more detailed logic rules(similar to Pseudocode in arm arch spec) deep in the SMMU IP module? And why the SMMU take the logic rules?