Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Why is my Cortex-M4 taking too much cycles?
Jump...
Cancel
Locked
Locked
Replies
7 replies
Subscribers
119 subscribers
Views
7310 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Why is my Cortex-M4 taking too much cycles?
Florian Augustin
over 12 years ago
Note: This was originally posted on 10th September 2012 at
http://forums.arm.com
Dear Arm-experts,
i wanted to use the FPU of my STM32F4 (Cortex-M4). To see if it's working properly i compared with this page:
http://www.micromouseonline.com/2011/10/26/stm32f4-the-first-taste-of-speed/?doing_wp_cron=1347294891.0981290340423583984375
He is using exactly the same processor and toolchain (With GCC Compiler).
Here is how long it takes with my settings:
REFERENCE / [font=Verdana, sans-serif][size=2]Reference // Mycontroller running from Flash // My controller running from Sram[/size][/font]
long lX, lY, lZ;
lX = 123L; // 2 cycle // 2 cycle // 5 cycles
lY = 456L; // 2 cycle // 3 cycles // 3 cycles
lZ = lX*lY; // 5 cycles // 7 cycles // 9 cycles
fX = 123.456; // 3 cycles // 5 cycles // 4 cycles
fY = 9.99; // 3 cycles // 5 cycles // 4 cycles
fZ = fX * fY; // 6 cycles // 10 cycles // 10 cycles
fZ = sqrt(fY); // 20 cycles // 2742 cycles // 3405 cycles
fZ = sin(1.23); // 124 cycles // 1918 cycles // 2552
The settings are Arm architecture: v7EM
Arm core type: Cortex-M4
Arm FP Abi Type: Soft-FP (Or Hard, doens't make a huge difference)
Arm FPU Type: FPv4-SP-D16
GCC target: arm-unknown-eabi
So not only the floating point arithmetic is runing slower but also integer! And sin and sqrt are horrible!!
The offset of my cycle measurement is deducted.
In CP10 and CP11 is 0b11 so FPU should be enabled properly.
Do you have any idea what is wrong with my settings or my toolchain or whatever??
Thank you so much for you efforts!
Florian
Parents
Florian Augustin
over 12 years ago
Note: This was originally posted on 12th September 2012 at
http://forums.arm.com
I saw some code by Joseph Yiu for the Cortex M3 to count cycles. So I added a part to subtract the offset and this is what came out:
int cyc[2],offset;
float x;
volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004; //address of the register
volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000; //address of the register
volatile unsigned int *SCB_DEMCR = (volatile unsigned int *)0xE000EDFC; //address of the register
#define STOPWATCH_START { cyc[0] = *DWT_CYCCNT;}
#define STOPWATCH_STOP { cyc[1] = *DWT_CYCCNT; cyc[1] = cyc[1] - cyc[0]-offset; }
STOPWATCH_START
__asm volatile("nop");
cyc[1] = *DWT_CYCCNT; cyc[1] = cyc[1] - cyc[0];
offset = cyc[1] - 1;
STOPWATCH_START
lX = 123L; // 2 cycle
lY = 456L; // 2 cycle
lZ = lX*lY; // 5 cycles
STOPWATCH_STOP
I'm running with optimization level 0, but if i switch to level 3 I save 1 cycle with the 3 integer operations but loose 4 cycles with the 3 float operations....
This is so strange everything!!!
Thank you very much, Sim!
How are you attempting to time the execution of each of the instructions?
s.
Cancel
Vote up
0
Vote down
Cancel
Reply
Florian Augustin
over 12 years ago
Note: This was originally posted on 12th September 2012 at
http://forums.arm.com
I saw some code by Joseph Yiu for the Cortex M3 to count cycles. So I added a part to subtract the offset and this is what came out:
int cyc[2],offset;
float x;
volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004; //address of the register
volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000; //address of the register
volatile unsigned int *SCB_DEMCR = (volatile unsigned int *)0xE000EDFC; //address of the register
#define STOPWATCH_START { cyc[0] = *DWT_CYCCNT;}
#define STOPWATCH_STOP { cyc[1] = *DWT_CYCCNT; cyc[1] = cyc[1] - cyc[0]-offset; }
STOPWATCH_START
__asm volatile("nop");
cyc[1] = *DWT_CYCCNT; cyc[1] = cyc[1] - cyc[0];
offset = cyc[1] - 1;
STOPWATCH_START
lX = 123L; // 2 cycle
lY = 456L; // 2 cycle
lZ = lX*lY; // 5 cycles
STOPWATCH_STOP
I'm running with optimization level 0, but if i switch to level 3 I save 1 cycle with the 3 integer operations but loose 4 cycles with the 3 float operations....
This is so strange everything!!!
Thank you very much, Sim!
How are you attempting to time the execution of each of the instructions?
s.
Cancel
Vote up
0
Vote down
Cancel
Children
No data