This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex: support of unaligned data access

The Cortex-M3 should support unaligned data access to save RAM space without the lost of performances. Is there a possibility to enable this features for an entire project or do I have to use the __packed attribut for each data structure?

Parents

0 Andy Neil over 17 years ago in reply to Marcus Harnisch

Please be sure to update your other thread(s) on other forum(s) so that everybody gets to benefit, and people on one forum don't waste time repeating what's already been said on another forum.

You should always do this as a matter of courtesy when posting the same question on multiple forums - known as "cross-posting"

Providing clickable links between the threads is generally sufficient...

(This forum automatically makes URLs clickable; on the STM forum, you have to do it manually)
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Andy Neil over 17 years ago in reply to Marcus Harnisch

Please be sure to update your other thread(s) on other forum(s) so that everybody gets to benefit, and people on one forum don't waste time repeating what's already been said on another forum.

You should always do this as a matter of courtesy when posting the same question on multiple forums - known as "cross-posting"

Providing clickable links between the threads is generally sufficient...

(This forum automatically makes URLs clickable; on the STM forum, you have to do it manually)
Cancel
Vote up 0 Vote down

Cancel

Children

0 ImPer Westermark over 17 years ago in reply to Andy Neil

The x86 has always supported unaligned accesses by having the memory interface hide the dual-access.

This has given a lot of PC programmers bad habits - strange "bus error" messages when they move their code from the PC to other processors and notices that it isn't ok to just typecast a void pointer into a short* or long* pointer and use it for multi-byte memory accesses.

But what you introduce when you use nonaligned data is either stall cycles or loss of memory bandwidth for other devices.

The most common stall cycle is where the processor will have to wait an extra cycle for getting the result of a read operation. The PC processor tries to mask this by huge cache memories that makes sure that
1) the memory interface can do "read ahead" and always read larger chunks than 1, 2 or 4 bytes. You may have 8 or 16 bytes or even wider memory interfaces.
2) after the cache has been loaded with both the two addresses needed for combining a nonaligned access, no memory read (or extra stall cycle) will be needed for the merging of the data.

But you can also have stall cycles on writes, since a processor either performs memory writes synchronously, or supports a limited number of outstanding writes (waiting for the memory interface to be ready to accept one more address + write data). If the processor can't support multiple outstanding writes, then every nonaligned write will stall the processor. And a processor with delayed writes will be stalled if you do several unaligned writes after each other (unless possibly they are to a continuous memory area in which case the memory interface might be able to combine several unaligned writes into several aligned writes instead of performing "smaller" writes).

Another thing is that a aligned write to a memory interface of the same width is just a write. If you have an unaligned write, then the memory controler must do:
1) read of first word.
2) bit-and + bit-or of the part of the word that should be replaced
3) write of first word.
4) read of second word.
5) bit-and + bit-or of the part of the word that should be replaced.
6) write of second word.
Some of the above steps can be combined or reordered, but it should be obvious that unless you have a memory interface running at a higher clock speed than your processor (in the real world, it is the reverse unless the processor is "intentionally" slowed down to 1:1) so any extra access do cost time.

The really big advantage (besides reduced code size) with a processor that handles read and write combining in the memory controller is that it saves on required bandwidth to supply new op-codes to process.
Cancel
Vote up 0 Vote down

Cancel
0 Raphael Lï¿½ffel over 17 years ago in reply to ImPer Westermark

Hi

Thanks to all for the feedback about this issue

The Link of the cross-post at the STM32 is:

www.st.com/.../forums-cat-7816-23.html

Regards
Cancel
Vote up 0 Vote down

Cancel
0 Marcus Harnisch over 17 years ago in reply to ImPer Westermark

> Another thing is that a aligned write to a memory
> interface of the same width is just a write. If you
> have an unaligned write, then the memory controler
> must do: [...]

Not with ARM processors that I am aware of. Unaligned access will be broken down into aligned accesses of smaller size.

E.g. a word access to address 0x55 will be accessing a byte at address 0x55, a half word at address 0x56 and another byte at address 0x58.

Since memory systems in ARM are required to support all access sizes, all is taken care of by byte enable. No R-M-W needed.

Regards
Marcus
http://www.doulos.com/arm/
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 17 years ago in reply to Marcus Harnisch

Yes, the ARM line of processors has this requirement for all memory interfaces. But this only goes for the interface to the core - you will not know if the physical memory supports byte or half-word accesses or if this is done by glue logic that activates the nWAIT signal or by stretching MCLK while performing a read-modify-write.

Your example shows another important thing relevant to the ARM core and unaligned accesses. Your unaligned write resulted in three writes, since the ARM can't signal a three-byte write or an unaligned two-byte write.
Cancel
Vote up 0 Vote down

Cancel