Hi Friends, greetings of the day...
I am new to this Arm community and very beginner to work on Arm cortex...
I am doing some investigation on a multi-core processor.
1. Can some one please tell me what is the power consumption of arm cortex A5 core ?
There is a default core from which i am going to enable A5 core and measure the current.
2. Should there be any change in the power consumption, if A5 is operating at different clock rates?
Hope my question is clear and sorry if there is any mistake : )
Please reply... Thanks
Hello, ARM develops the architecture and licenses it to other companies, therefore power consumption of particular processor depends on implementation of processor core design.
Find out who produces your ARM Cortex A5 processor and get its datasheet with information about power consumption.
I give you example of two processors with the same core design but different implementations:
1) NXP i.MX 6 Series Applications Processors with ARM Cortex A9 core;
2) Texas Instruments Sitara AM4x Processors with ARM Cortex A9 core.
Hi Vanhealsing, Thanks for the reply.
1. I am using HALO mac57d5xx from NXP. I have checked the datasheet of this chip but i dont see anything related to
the power consumption of A5 core...
2. And i don't understand why memcpy function is taking more time when executed in A5 core which is running at
higher frequency (320Mhz) compared to M4 core running at lower frequency (160 Mhz)... Could you please comment on this Vanhealsing.
I tried with lower frequency for A5 core too, but the result was same....
Thanks in advance.
Hope you understood my question. or
Do you want me to elaborate.
1. If datasheet doesn't provide information about power consumption try to look other infrormation (for example documentation like application notes) about low power modes of a particular processor. Usually producers of such System-on-Chips describe power consumption of chips in defferent power modes.
2. First, these cores Cortex A5 and Cortex M4 are different in their nature, because designed for different goals:
processor with Cortex A5 core is an application processor with a pipeline of 8-stages,
Cortex M4 is a core for microcontrollers and it has pipeline of 3 stages,
Cortex M4 usually programmed by Thumb instructions wich is smaller in size than ARM instructions of Cortex A5,
Cortex M4 operates on physical memory addresses,
Cortex A5 operates on virtual memory addresses which includes translation phases of Memory Management Unit (but you can configure to disable MMU and work with physical addresses as well, but without operating system),
Cortex M4 microcontrollers might not have cache,
Cortex A5 usually have L1 and L2 caches.
Second, memcpy function implements copy operation, therefore you have to understand from what source you read and where you write (for example read from internal SRAM and write to external DRAM).
The same code can be executed from different memory regions with different speed of execution.
Thanks Again Vanhealsing.
I tried different regions for executing memcpy.
For example copy 1kb byte from Internal SRAM to Internal SRAM, Internal flash to Internal SRAM etc.. I ran application
for both cores M4 and A5 individually and as i said in all scenario's memcpy in M4 is executing faster than A5.
A5 is operating @320Mhz and M4 is operating @ 160Mhz.
I was thinking that core running at higher frequency should be performing faster, but that is not what i am observing.
What could be the possible explanation for this.
Thanks Van for spending your valuable time and information.
Core running at higher frequency does not mean executing code at higher frequency because of slow memory core may stuck in pipeline stall waiting for data load or data store to complete.
Cortex M4 have higher density code because of Thumb instructions.
Cortex M4 ICode and DCode buses allow core to perform faster code execution (instruction and data can be loaded in 1 cycle).
Try to use for long copy operations DMA controller in the SoC with Cortex A5 core and compare performance.
This is the first time i registered for some community for getting help and happy to see people like you spare time to help someone in need...
I truly Appreciate that.
" Kindness is Free, sprinkle that stuff everywhere".
I have measured memcpy function time running A5 core at 80Mhz and M4 core @ 160Mhz, and still i see the same. i.e A5 is taking more time to copy 1kb bytes compared to M4 @ lower frequency
Thinking the A5 core running at 320 Mhz stall it, could be the reason you mentioned above, i tried with lower clock speed.
As you said, i can go for DMA but i want to try with memcpy.
Could you please provide me some suggestion, what could be the reason for A5 core taking more time.
Is it due to hardware design, bus architecture or what could be the possible reasons?
Explain me please how did you measure the time of execution of memcpy function on Cortex A5.
I have two boards with Cortex A5 and Cortex M4 and I am trying to do the same operation on my boards now.
I have kept 2 breakpoints in the code i.e. before and after memcpy function and enabled on-chip timer module to run @ 1Mhz. Now when i run, the control will stop @ 1st breakpoint and running it again will be executing the memcpy function and stop at 2nd breakpoint. The timer module counter register value will give me the no.of ticks using it i have calculated the time taken to execute memcpy..
Could enabling cache in M4 increases the memcpy performance?
Do you mean instruction cache or data cache of Cortex M4?
And how did you measure the time on Cortex M4 (by using SysTick timer)?
There is no easy answer for your question))
You have to know architecture of memory and buses of both Cortex A5 and Cortex M4 processors
Both I-cache and D-cache.
I have used flex timer module for both the cores.
ok, i will try to understand about the architecture.
One more measurement for Cortex M4,
#define LENGTH 1024
/* both arrays allocated in internal SRAM */uint8_t source[LENGTH];uint8_t dest[LENGTH];
using memcpy from <string.h> without compiler optimisation I got better result:
973 ticks, ~ 8 microseconds,
this result was obtained beacuse of Thumb LDM, STM (load multiple and store multiple) instructions which were used in memcpy from <string.h>
I gave you three similar examples for only Cortex M4 working on the same frequency 120 MHz
Superb Van, thanks you very much.
Are you going to check memcpy time for A5 cortex ?
Not today, may be tomorrow, I have to find out how timer works and how to configure propper frequency on Microchip-Atmel board with Cortex A5, what registers to program to get correct timing results for comparison.
So in conclusion, compiler optimisation can make your function better in sense of execution speed and precompiled library functions have optimised ARM instructions for particular case, I saw LDM and STM instructions in disassembler view.
Tomorrow or next day after tomorrow I will try to do the same measurement test on Cortex A5 for comparison.
I read briefly about NXP MAC57D5xx and if I understand right you tried to execute the same code from the same memory on two different cores of one system and got different results for the same library memcpy function?
I dont know exactly memcpy is from i guess it is from string.h.
But When i did step by step debugging of memcpy function in both cores, i saw that the memcpy implementation is different for M4 and A5 ( Assembly code).
What is the need of FPU in memcpy function?
When i was running memcpy for A5, there was some issue and upon checking the memcpy implementation some floating instruction was used and was causing the problem, later i changed the FPU information in the makefile of A5, then memcpy was executing successfully.
Actually i am using already available applications and trying to measure time taken by memcpy... So dont have much knowledge.
Could you please suggest me something.
Second test on Cortex A5, Microchip Atmel Sama5D2Xplained board,
Processor clock = 498 MHz, Master clock = 83 Mhz (clock for bus),
PIT timer clock = 5 187 500 Hz, (~192,7 Nanoseconds)
I used the same code, buffers in internal SRAM
time of memcpy execution ~ 19 ticks or ~ 3.66 microseconds,
memcpy implemented by ARM instructions using LDM and STM instructions.
As you can see in first sight the same code executed more quickly, but processor operates on four times higher frequency (498 MHz) than Cortex M4 (120 Mhz).
All these measurements are not absolute and if you want more determinism try to use Cortex R series processors or Cortex M with predictable timings.
Thank you so much Van.
I Will try and if there is anything, will get to you.
View all questions in Community Help forum