In what situations will separate data buses ( D and S) for ARM Cortex-M4 improve performance? Also, are there any benefits of von Neuman support along with the core Harvard Architecture?
Hello,
for an execution of load or store instructions, the separate buses are beneficial.
Because of its pipeline architecture, an instruction fetch and a data access are performed at the same time.
If the buses are separated, the accesses are not interfered for each other.
However, if the instruction fetch and the data access targeted at the same memory resources, the bus collision would occur, decreasing the performance.
The situation of which performance will improve is that the instruction fetch is executed from the flash memory and the data access is targeted to the SRAM.
Regarding the benefit of von Neuman support, I cannot catch your intention of the question.
The performance of the von Neuman architecture will be smaller than non von Neuman architecture, because the instruction exists in the memory and the procedure of the fetch, decode and execution are needed compared with the direct execution of a kind of FPGAs.
Best regards,
Yasuhiko Koumoto.
Yasuhiko's answer is correct.
In addition to the above, I'd like to give another example:
Imagine that you are using the DMA to transfer data from SRAM to a peripheral or from SRAM to SRAM (eg. a copy operation at highest possible speed). The DMA has been set up to transfer the data very quickly.
You have your code running in SRAM as well. You've chosen to have your code in SRAM on this particular device in order to save power and also to gain a little extra speed, because flash memory often have latencies.
-But this means you would have a lot of collisions.
Fortunately, you're performing a lot of calculations and very few memory accesses.
Thus collisions with memory access are very rare.
The instruction cache is then used for caching the instructions that you're using for your calculations, while your DMA is actually handling data transfers.
This would give you a very good performance.
-Of course, my example is only theory. This all depends on what you have available on your Cortex-M4 device.
Some devices have extremely good flash accelerators (for instance STMicroelectronics STM32F4xx's ART accelerator).
Unfortunately, I'm not qualified to explain the details and differences of the architectures, but I can say that the Cortex-M4 has a very good performance advantage: Most instructions are single-cycle.
-And not to forget: The low interrupt-latency and the bit-banding (which is particular useful for setting/clearing bits atomically in hardware registers).
Thanks yasuhikokoumoto and jensbauer for ur replies. Both of u very well explained the significance of a data bus along with an instruction bus.
But i would like to know more about why two data buses instead of one? Is it entirely for DMA support for cases like the one explained by jensbauer or there are other situations as well.
Regarding Von Neuman architecture my point is that ARM Cortex M4 has Harvard architecture
but suppose we access instructions and data from a memory above 2000_0000 , what we get is a von Neuman kind of architecture, in the sense all instructions and data appear on a single bus (SYS bus).
So my doubt is whether this feature has a special intention in terms of architectural benefit?
Regarding the von Neuman architecture, it will not contribute the performance because the program is stored in the memory and CPU cannot execute until fetching it from the memory. However, the von Neuman architecture will be epoch making because program can dynamically modify the program and execute it. This means a program can produce another program and can be said as evolution of the original program.Regarding the Harvard architecture, you should drop the thinking that there are two buses but you had better get thinking that there are one instruction bus and one data bus. Unless the other bus masters do targeted to the same slave, each bus can work without disturbing.
Regarding Von Neuman architecture my point is that ARM Cortex M4 has Harvard architecture but suppose we access instructions and data from a memory above 2000_0000 , what we get is a von Neuman kind of architecture, in the sense all instructions and data appear on a single bus (SYS bus).
In this case, of course, there is no performance benefit.However, as jensbauer says, almost all MCU equips system caches for each instruction or data. For usual cases, there will be no collision on the same SRAM.
Are there any concerning points of you?
Best regards,Yasuhiko Koumoto.