We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I believe that many of us are interested in the ARM Cortex-M7.
Recently, jyiu posted a status update, where I asked a couple of questions about the architecture.
A few questions on the subject was also asked in the Interview and Question Time with Joseph Yiu discussion.
As I think the information posted is important and relevant, I'm posting a shortened version here, so it's easier to find.
Links:
Cortex-M7 Processor - ARM
ARM Cortex-M7 Processor Technical Reference Manual
ARMv7-M Reference Manual (Issue E.b)
AnandTech | Cortex-M7 Launches: Embedded, IoT and Wearables
ARM Supercharges MCU Market with High Performance Cortex-M7 Processor
ARM gives Internet of Things a piece of its mind – the Cortex-M7
STM32F7 von STMicroelectronics: ARMs Cortex-M7 (this article is in german)
Freescale Plans Extreme Performance for Kinetis MCUs with ARM® Cortex®-M7 Core
ARM Cortex-M - Wikipedia
NEW App Note: Migrating Application Code from ARM Cortex-M4 to Cortex-M7 Processors
Meet the new ARM Cortex-M7 processor: supercharging embedded devices
Atmel launches new series ARM Cortex-M7 based MCUs
As you see, STMicroelectronics will be releasing their first Cortex-M7 soon; Microchip and Freescale are also close.
Move the mouse over green-coloured abbreviations, in order to see what they mean.
Q: The Cortex-M7 now has a Branch Predictor and a BTAC. Does this mean that branches use 1 clock cycle only (or perhaps even below) ?
A: Yes, if correctly predicted the branch instruction is only 1 cycle.
Q: Does the 6-stage pipeline mean that loads can be archieved in a single cycle as well ?
A: Load from TCM is pipelined with other operations, so essentially single cycle or even less due to dual issue.
Q: The Cortex-M7 should be able to run at speeds up to 400Mhz, is that correct ?
A: In term of clock frequency, it is dependent on the semiconductor process nodes.
400MHz is the estimation for 40nm low power (LP) process. If using 28nm (e.g. 28hpm) or 14nm, the clock frequency can go much higher.
Q: From what I've heard, Interrupt latency is sometimes 12, sometimes 11 clock cycles; depending on the situation ?
A: The interrupt latency is a complex topic because it depends on how the memory system design looks.
The complete picture is fairly complex and I think we will need to create a separate document for that.
Q: Is it possible to move data directly between general purpose registers and floating point (single/double precision) registers without storing the data in memory first ?
A: The VMOV instruction (which exists on the Cortex-M4 already) allows data value to be transferred between general registers and floating point registers.
According to the Wikipedia, the Cortex-M7 supports the same instruction set as Cortex-M4F. I do not know if there are any additions to the instruction set, but I would expect that in order to use double-precision floating points and because of the enhanced DSP Extensions and BPU, there might be a few extra (I'm only guessing here).
Personally, I look very much forward to using the Branch Predictor, BTAC, the 6-stage superscalar pipeline, the dual integer pipe ALU, the higher speed, the double-precision floating point and the FPP.
If you have some technical information, I'd like to encourage you to post it here.
Hi Jens,
Thanks for putting information together.
Regarding you comments:
> As you see, STMicroelectronics will be releasing their first Cortex-M7 soon; Freescale is also close and I believe we will also see something from NXP soon.
Err.... normally we can't comments on our customer's developments. But please have a look at the following article which tells you who are the lead partners.
http://www.design-reuse.com/news/35489/arm-cortex-m7-processor.html
http://atmelcorporation.wordpress.com/2014/09/24/arm-unveils-32-bit-cortex-m7-processor-for-the-internet-of-things/
Regarding
> Q: The Cortex-M7 now has a e BPU and a BTAC. Does this mean that branches use 1 clock cycle only (or perhaps even below) ?
> A: Yes, if correctly predicted the branch instruction is only 1 cycle.
Be carefully on the acronym. In our documentation, the BPU is Breakpoint Unit. We don't have a separate acronym for the branch prediction logic in the Cortex-M7.
We have also enhanced the breakpoint unit in the Cortex-M7 design. But this has nothing to do with the performance.
>According to the Wikipedia, the Cortex-M7 supports the same instruction set as Cortex-M4F.
> I do not know if there are any additions to the instruction set, but I would expect that in order
>to use double-precision floating points and because of the enhanced DSP Extensions and BPU,
> there might be a few extra (I'm only guessing here).
Apart from instruction set we have added extra registers for cache control and memory configurations.
> the double-precision floating point and the FPP.
Sorry, is FPP a typo? I am not sure where this come from. If it is FPU then it is Floating Point Unit
>A: The VMOV instruction allows data value to be transferred between general registers and floating point registers.
This instruction exist in Cortex-M4 already.
Note: 5 Coremark/MHz is what we have got with current version of IAR compiler (v7.30). I am not sure if any compiler vendor will be able to push that even higher.
Please note the technical details of the Cortex-M7 will be covered in a presentation in ARM TechCon.
Once all these work are done we will publish additional materials to detail all technical features.
The Cortex-M7 Device Generic User Guide will soon be available
(to be honest I don't know when it will be released on our website, I need to check).
Currently I am also working with our engineering team to create additional documentation.
Given the chips will not be in mass production until couple of months later (currently they are
engineering samples), I am sure we will have the document ready before the official production releases.
regards,
Joseph
Thank you, Joseph for the corrections; I've made a couple of changes, so the initial post becomes a little more correct.
The FPP was not a typo; I have taken it from the Anandtech Web-page, which mentions the Floating Point Pipeline.
BPU: Uhm.. Wouldn't it be more correct to use BPU for Branch Prediction Unit and then BU for Breakpoint Unit (or DBU (Debug Breakpoint Unit) ?
The Anandtech document mentions the Branch Prediction Unit as 'branch predictor'.
Hi,
I have a very simple question: how many clock cycles will be needed for fifty taps 32-bit FIR filter output sample computation ?
Circular buffers addresing will be supported by a hardware ?
Regards
RomanR
We have also enhanced the breakpoint unit in the Cortex-M7 design.
Is an updated v7m reference manual or other detailed documentation available regarding the enhanced debug features?
Due to business travel I am not able to do the test at the moment.
There is no change in the DSP/SIMD instruction set, not there is no hardware based circular buffer address support.
However, the CMSIS-DSP library has some optimizations to enable faster handling of arrays.
There will be an update version of the ARMv7-M Architecture Reference Manual. But I don't know the schedule for the release at the moment.
(Currently this document is still in beta).
Happy New Year!
Cortex-M7 Technical Reference Manual (TRM) is now online
ARMv7-M Architecture Reference Manual issue E has been released. (ARMv7-M Architecture Reference Manual)
Also, there is a useful application note posted by bobboys
Thanks, I've now added the links above.
Nice job Jens !
Jens,
Thanks for pulling together the useful list.
You may find some other useful links in the launch blog that includes some of the announcements and blogs from the Partners.
jensbauer,
You may also want to take a look at Microchip’s latest Cortex-M7 based MCU lineup, which was announced during CES 2015: Atmel launches new series ARM Cortex-M7 based MCUs
Hope this helps!Artie
Thanks for the links lorikate and artiebeavis - I've added them to the list.
I'm still checking for Cortex-M7 availability at Farnell, RS, DigiKey and Mouser frequently.
If I come across the Cortex-M7 there, I'll make sure to let everyone know here.
Thanks for sharing this. Thanks for the sharing, I also found a useful service for forms filling. You will be surprised how easy it can be to fill forms. Try fillingl a form through the online software PDFfiller. On-line PDF form Filler, Editor, Type on PDF ; Fill, Print, Email, Fax and Export
Can anybody help to get information regarding stalls on Cortex-M7.. I am using STM32F769NI Eval Board on IAR Tool chain. I wrote a simple ASM code of 50 instructions mostly using VLDMIA's and VMLA's, but i am getting around 170 cycles in executing these instructions.
Thanks in Advance.
Jaikanth.