This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

System Bus in ARM Cortex-M4

In what situations will separate data buses ( D and S) for ARM Cortex-M4 improve performance? Also, are there any benefits of von Neuman support along with the core Harvard Architecture?

Top replies

Yasuhiko Koumoto over 10 years ago +1 verified

Hello, for an execution of load or store instructions, the separate buses are beneficial. Because of its pipeline architecture, an instruction fetch and a data access are performed at the same time. If...

Parents

0 Jens Bauer over 10 years ago

Yasuhiko's answer is correct.
In addition to the above, I'd like to give another example:
Imagine that you are using the DMA to transfer data from SRAM to a peripheral or from SRAM to SRAM (eg. a copy operation at highest possible speed). The DMA has been set up to transfer the data very quickly.
You have your code running in SRAM as well. You've chosen to have your code in SRAM on this particular device in order to save power and also to gain a little extra speed, because flash memory often have latencies.
-But this means you would have a lot of collisions.
Fortunately, you're performing a lot of calculations and very few memory accesses.
Thus collisions with memory access are very rare.
The instruction cache is then used for caching the instructions that you're using for your calculations, while your DMA is actually handling data transfers.
This would give you a very good performance.
-Of course, my example is only theory. This all depends on what you have available on your Cortex-M4 device.
Some devices have extremely good flash accelerators (for instance STMicroelectronics STM32F4xx's ART accelerator).
Unfortunately, I'm not qualified to explain the details and differences of the architectures, but I can say that the Cortex-M4 has a very good performance advantage: Most instructions are single-cycle.
-And not to forget: The low interrupt-latency and the bit-banding (which is particular useful for setting/clearing bits atomically in hardware registers).
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Jens Bauer over 10 years ago

Yasuhiko's answer is correct.
In addition to the above, I'd like to give another example:
Imagine that you are using the DMA to transfer data from SRAM to a peripheral or from SRAM to SRAM (eg. a copy operation at highest possible speed). The DMA has been set up to transfer the data very quickly.
You have your code running in SRAM as well. You've chosen to have your code in SRAM on this particular device in order to save power and also to gain a little extra speed, because flash memory often have latencies.
-But this means you would have a lot of collisions.
Fortunately, you're performing a lot of calculations and very few memory accesses.
Thus collisions with memory access are very rare.
The instruction cache is then used for caching the instructions that you're using for your calculations, while your DMA is actually handling data transfers.
This would give you a very good performance.
-Of course, my example is only theory. This all depends on what you have available on your Cortex-M4 device.
Some devices have extremely good flash accelerators (for instance STMicroelectronics STM32F4xx's ART accelerator).
Unfortunately, I'm not qualified to explain the details and differences of the architectures, but I can say that the Cortex-M4 has a very good performance advantage: Most instructions are single-cycle.
-And not to forget: The low interrupt-latency and the bit-banding (which is particular useful for setting/clearing bits atomically in hardware registers).
Cancel
Vote up 0 Vote down

Cancel

Children

No data