Arm Development Studio forum Why is my Cortex-M4 taking too much cycles?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why is my Cortex-M4 taking too much cycles?

Note: This was originally posted on 10th September 2012 at http://forums.arm.com

Dear Arm-experts,

i wanted to use the FPU of my STM32F4 (Cortex-M4). To see if it's working properly i compared with this page:
http://www.micromouseonline.com/2011/10/26/stm32f4-the-first-taste-of-speed/?doing_wp_cron=1347294891.0981290340423583984375

He is using exactly the same processor and toolchain (With GCC Compiler).
Here is how long it takes with my settings:

REFERENCE / [font=Verdana, sans-serif][size=2]Reference // Mycontroller running from Flash // My controller running from Sram[/size][/font]

long lX, lY, lZ;
lX = 123L; // 2 cycle // 2 cycle // 5 cycles
lY = 456L; // 2 cycle // 3 cycles // 3 cycles
lZ = lX*lY; // 5 cycles // 7 cycles // 9 cycles
fX = 123.456; // 3 cycles // 5 cycles // 4 cycles
fY = 9.99; // 3 cycles // 5 cycles // 4 cycles
fZ = fX * fY; // 6 cycles // 10 cycles // 10 cycles
fZ = sqrt(fY); // 20 cycles // 2742 cycles // 3405 cycles
fZ = sin(1.23); // 124 cycles // 1918 cycles // 2552

The settings are      Arm architecture: v7EM
       Arm core type: Cortex-M4
       Arm FP Abi Type: Soft-FP (Or Hard, doens't make a huge difference)
       Arm FPU Type: FPv4-SP-D16
       GCC target: arm-unknown-eabi

So not only the floating point arithmetic is runing slower but also integer! And sin and sqrt are horrible!!
The offset of my cycle measurement is deducted.
In CP10 and CP11 is 0b11 so FPU should be enabled properly.

Do you have any idea what is wrong with my settings or my toolchain or whatever??

Thank you so much for you efforts!

Florian