This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why the overhead of memcpy() in EL3 is higher than in NS.EL1?

I evaluated the performance overhead of memcpy (memory copy) in both NS.EL1 (kernel module) and EL3 (arm trusted firmware):

1. I allocated two contiguous physical memory buffers A and B via Linux's CMA allocator (specifically, via cma_alloc()).

2. In NS.EL1's kernel module, I directly use memcpy(A, B, sizeof(A)) for memory copy.

3. In EL3 BL31, I use memcpy(phys_addr(A), phys_addr(B), sizeof(A)) for memory copy. Note that I initialized the page table (flat region map) of EL3 during bl31_setup and directly pass two buffers' physical addresses to perform memory copy, thus no page fault will be triggered.

4. To evaluate the overhead, I read the PMU counter register pmccntr: asm volatile("mrs %0, pmccntr_el0" : "=r" (r));

So that the cycle evaluation is like that:

```
start = getCycle();

memcpy(A, B, ...);

end = getCycle();

cycle = end - start.

```
I performed the evaluation on a Juno R2 board and set the memory buffer size from 4KB to 64MB.During the evaluation, I just enabled only one CPU core.This is a short summary of the results:

Memcpy	Time in EL1 Kernel Module	Time in EL3 BL31 (arm trusted firmware)
4KB	1,324 cycles / 0.0015 ms	20,785 cycles / 0.02 ms
64KB	22,412 cycles / 0.026 ms	328,951 cycles / 0.39 ms
1MB	549,383 cycles / 0.66 ms	5,446,983 cycles / 6.5 ms
64MB	38,262,713 cycles / 45.91 ms	348,783,503 cycles / 418.5 ms

Counterintuitively, I find that memcpy() in EL3 is 10x slower than in NS.EL1 kernel module. Are there any possible explanations? Is this due to different cache & data coherence models in EL1 and EL3?

Top replies

Sudeep Holla over 2 years ago +2 verified

If you are using Linux in NS.EL1, the memory copy routine is highly optimised one and imported from https://github.com/ARM-software/optimized-routines . Not sure about the same for the software your are...

+1 Sudeep Holla over 2 years ago

If you are using Linux in NS.EL1, the memory copy routine is highly optimised one and imported from https://github.com/ARM-software/optimized-routines. Not sure about the same for the software your are running in EL3
Cancel
Vote up +2 Vote down

Cancel