This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Longer pipelines on Cortex R's vs real-time & performance

On Cortex M's , you have 3 stage pipelines, while Cortex R's starting from R4 up you have 8 stage, and R7 even 11.

I don't understand, isn't worse for real-time interrupt response to have longer pipelines? I did read that Cortex-R can interrupt long stores/loads, and jump straight to the vector address & the irq number passed, that is great, but doesn't the pipeline get flushed every hw interrupt or exception/fault, and needs to be re-filled with instructions? So more cycles spent.? And, it's not that the pipeline get saved for return from interrupts..?

Top replies

42Bastian Schick over 2 years ago in reply to Holmes2001 +1

Holmes2001 said: Cortex-R and cortex-M series is targeted for different requirements and for different applications. Nothing new so far. Why should the PMU make an Cortex-R a "real-time" CPU? The...

0 42Bastian Schick over 3 years ago

As always: It depends on what you need/want as worst case interrupt response.If you run at 300MHz and need a interrupt response of 1us, then flushing and refilling the pipeline likely does not matter.
Cancel
Up 0 Down

Cancel
0 d.ry over 3 years ago in reply to 42Bastian Schick

But longer pipeline still takes longer to fill in, no matter at what freq you run the core. Ignore if it's 1us, or 100ms, lets say it needs N cycles to fill in N stage pipeline.

On the other hand, I also don't see through this: so with longer pipelines, you get better overall throughput. And it's better when pipeline is fully busy, no bubbles, hazards, or flushes.

Now, if my chip will have many interrupts to handle & flushes, then this overall throughput will be going down. So now, I'm gonna be with longer pipelines to fill before responding to ISRs, and I'm going to have declining pipeline benefit from long pipline because i have all these async events & pipeline flushes ...

Is there some way one calculates how the ISR load impacts on pipeline benefit..?
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 3 years ago in reply to d.ry

Sure, the pipeline gets flushed and must be reloaded. But it does not mean the core stalls. Your worst case interrupt latency depends on the current instruction executed and the number of cycles for the new instruction reaching final stage.

If this takes 10 cycles, then you need to see if this fits your need at 30MHz, if not than maybe at 60MHz.

If you have a Cortex-A9 with a 12 stage pipeline, having it run at 1GHz is sufficient for an 1us interrupt.

Only, and this might be a problem, you get a larger jitter the longer the pipeline.
Cancel
Up 0 Down

Cancel
0 Tucker032 over 3 years ago

Cortex-R and cortex-M series is targeted for different requirements and for different applications. Performance Monitor Unit, Yes, No Performance Monitor Unit: This is the module which makes Cortex-R to be used for Real Time Applications. abort mask bit in a register and also because of number of pipeline stages.

official website
Cancel
Up 0 Down

Cancel
0 d.ry over 3 years ago in reply to Tucker032

Tucker032

That they different targeted families (R for real-time), I wasnt' doubting. My original question was specifically why longer pipeline on R: because you should start processing an ISR faster with a shorter pipline (I would think ..?).

Tucker032 said:
Performance Monitor Unit, Yes, No Performance Monitor Unit: This is the module which makes Cortex-R to be used for Real Time Applications.

This I didn't understand, what do you mean, and how the PMU relates to the pipline ..

Also, from its description:

https://developer.arm.com/documentation/100026/0101/performance-monitor-unit/about-the-pmu?lang=en

"..These provide useful information about the behavior of the processor that you can use when debugging or profiling code.."

This does not make or explain how R architecture is real-time. This is for monitoring its performance.

EDIT:

Interestingly this article on link here: https://www.design-reuse.com/articles/26106/cortex-r-versus-cortex-m.html . Also says "PMU is really what makes Cortex R be used for real-time apps: helps in profiling". Ok, so I need to profile the code on core, after selecting it .. which I should select based on knowing its real-time should allow me to meet my real-time to begin with... hmmm.

Also, that same link interestingly says: latency is higher for R then M.. Whaaaaat. And not even for pipline reasons, as suggested there.
Cancel
Up 0 Down

Cancel
0 Holmes2001 over 2 years ago

Cortex-R and cortex-M series is targeted for different requirements and for different applications. Performance Monitor Unit, Yes, No Performance Monitor Unit: This is the module which makes Cortex-R to be used for Real Time Applications. abort mask bit in a register and also because of number of pipeline stages.

Walgreenslistens
Cancel
Up -1 Down

Cancel
0 42Bastian Schick over 2 years ago in reply to Holmes2001

Holmes2001 said:
Cortex-R and cortex-M series is targeted for different requirements and for different applications.

Nothing new so far.

Why should the PMU make an Cortex-R a "real-time" CPU? The Cortex-A and even an Intel Xeon have performance units. None of which would one honestly call a real time CPU.

So why Arm calls it "realtime" is and maybe will be forever a mystery :-)
Cancel
Up +1 Down

Cancel
0 d.ry over 2 years ago in reply to 42Bastian Schick

Stumbled upon this post on a TI form:

TI forum link R4 vs M4

Quoting interesting part of TI reply :

" ...

The M4 uses a simple 32b AHB interface to access peripherals. A round-trip access is possible in three clock cycles best case (not considering pipelining or device level architecture impact).

The R4 uses a more complex 64b AXI interface to access peripherals. A round-trip access is possible in around seven clock cycles best case. This interface is more optimized for bursting access, parallel access by multiple bus masters, and for cache operations. As a result, it can move quite a bit more data than the M4 on average, but it does so by sacrificing some latency in its design."

Which then I'm back strongly to my question (or almost..) Why is the R4 (or Cortex-R in general) is "Real-Time", when you can get faster reponse from and M4 ??

Most of where I'm using R4/5 now is not moving any large amounts of data, its all small messages / packets, like CAN, or sensor readings, and the latency I think is much more important than moving more data in burts.
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 2 years ago in reply to d.ry

d.ry said:
Which then I'm back strongly to my question (or almost..) Why is the R4 (or Cortex-R in general) is "Real-Time", when you can get faster reponse from and M4 ??

I have seen one "explanation": The Cortex-R can accept interrupts during multi-cycle instructions (like STM,LDM and maybe xDIV).

But honestly: I never understood why the Cortex-M concept of automatic saving registers never made it to Cortex-R.

The term "real time" is something the application defines, never the CPU.
Cancel
Up 0 Down

Cancel
0 d.ry over 2 years ago in reply to 42Bastian Schick

42Bastian Schick said:
I have seen one "explanation": The Cortex-R can accept interrupts during multi-cycle instructions (like STM,LDM and maybe xDIV).

We have that already above, in my first post /question of the thread.
Cancel
Up 0 Down

Cancel