Slow DMA operations on ARM64

Hi,

I use vendor's custom driver and application for PCIe to SRIO on ARM64.

I perform remote DMA write and read between 2 ARM boards.

When executing the same setting on two PCs, no problem was found and DMA operation was very fast.

But on boards,

1. When SMMU is disabled and the physical address is the same as the bus address, there is no issue, but the DMA operation is slow.

2. When SMMU is enabled, I can not use virt_to_phys (), so I can not map a physical address to a user space kernel address.

My queries are,

1. Does the status of SMMU affect the performance of DMA?

2. Is there a problem because dma_cache_sync() can not be used on ARM?

3. Using dma_alloc_coherent () bus address and kernel address can be obtained. How to obtain the physical address so that I can map it to user space kernel address.

here is our application snippet:

status = AllocMemory( hDrv, local.memSize, &local.memPhysAdrs, &local.memBusAdrs ); 
printf("1. Physical Memory[DATA] : %llx [%llx] (0x%x)\n",local.memPhysAdrs,local.memBusAdrs,local.memSize);

// Map a physical memory block to virtual space 
status = MapMemory( hDrv, local.memPhysAdrs, local.memSize, (PVOID*)&local.hSharedMemory, MM_NONCACHED ); 
printf("Virtual Address[DATA] : %llx\n",local.hSharedMemory);

status = MemDmaWriteRaw(hDrv,DMA_WAIT_COMPLETION,partner.devId,Channel, local.memBusSrc,partner.memBusAdrs,local.memSize,0);

Thanks in advance!!