Hello to all,
I am trying to figure out the variation in current consumption as well as in clock cycles due to different memory regions and different offsets. During various experiments, I have found the following results:
LDR R4,[R1,#0x0] (R1 = 0x00000000 (Flash) with 0 offset) : Current = 2.60mAmps & Cycles = 2
LDR R4,[R1,#0x1] (R1 = 0x00000000 (Flash) with 1 offset) : Current = 2.07mAmps & Cycles = 4
LDR R4,[R1,#0x2] (R1 = 0x00000000 (Flash) with 2 offset) : Current = 2.30mAmps & Cycles = 3
LDR R4,[R1,#0x3] (R1 = 0x00000000 (Flash) with 3 offset) : Current = 2.08mAmps & Cycles = 4
from offset 4 it repeats the order. I mean Offset = 4 is equal to Offset = 0 ; Offset = 5 is equal to Offset = 1 ; Offset = 6 is equal to Offset = 2 ; Offset = 7 is equal to Offset = 3 ; Offset = 8 is equal to Offset = 0 and so on....
Similarly, while access SRAMX also :
LDR R4,[R1,#0x0] (R1 = 0x04000000 (Flash) with 0 offset) : Current = 2.88mAmps & Cycles = 1
LDR R4,[R1,#0x1] (R1 = 0x04000000 (Flash) with 1 offset) : Current = 2.30mAmps & Cycles = 3
LDR R4,[R1,#0x2] (R1 = 0x04000000 (Flash) with 2 offset) : Current = 2.65mAmps & Cycles = 2
LDR R4,[R1,#0x3] (R1 = 0x04000000 (Flash) with 3 offset) : Current = 2.29mAmps & Cycles = 3
On the basis of observation, I have three questions:
Kindly help me out with this. I am using LPCXpresso 54114 board, with ARM Cortex-M4 processor. And all the measurement have been taken at 12MHz.
Thanking you,
Regards,
Himanshu
The bus interface on Cortex-M4 is based on AHB Lite, and this protocol doesn't support unaligned transfers. So when you have an unaligned transfer, the bus interface break this up into multiple aligned transfers. As a result:
LDR R4,[R1,#0x0] - This need one 32-bit transfer, 4 byte lane active
LDR R4,[R1,#0x1] - This need three transfers:
- 1) 0x04000001 - byte size, 1 byte lane active
- 2) 0x04000002 - halfword size, 2 byte lane active
- 3) 0x04000004 - byte size, 1 byte lane active
LDR R4,[R1,#0x2] - This need two transfers:
- 1) 0x04000002 - halfword size, 2 byte lane active
- 2) 0x04000004 - halfword size, 2 byte lane active
When offset is 4, the word access is aligned again, so only take one transfer.
So the number of clock cycles need for the transfer depends on the address offset, and the power depends on how many byte lane in the memory is active.
The variation of the power when accessing different memory region:
- when you use flash address for the test: only the flash memory macro is active, SRAM is idle, so take less power
- when you use data SRAM address for the test: both the flash memory macro and SRAM are used at the same time (Note: instruction fetch and data access can happen concurrently, so it has one fewer clock cycle).
regards,
Joseph
Dear Mr. Yiu,
Thank you very much for your completely elaborated reply. My queries have almost been solved. Thank you once again.
But, I have few doubts. As mentioned in your reply,
1.) LDR R4,[R1,#0x1] needs 3 transfer (or 3 clock cycles), but I did not understand how?. SInce what I am providing is only the offset value to the register value, which adds to the register value and provides the memory address for the target register.
So, if we say due to offset #0x1, 1-byte lane is active, therefore it should take 4 cycles in order to pass 32-bit data. Am I right?
2.) Also, in case of offset #0x3, will only 1-byte lane become active? Is there any reason behind it? SInce for offset #0x1 and #0x2, we have seen 1 and 2-byte lane become active.
Sorry, for troubling you once again.
Thanking you and regards,
Hi Himanshu,
1) That is done by the processor hardware automatically.
The bus interface is 32-bit, and support 32-bit, 16-bit and 8-bit transfers.
The memory (conceptually) is designed to be 32-bit wide, with 4 byte lanes. Assume that it is little endian, the memory look like the following to the processor:
...
Byte[0xB], Byte[0xA], Byte[9], Byte[8].
Byte[7], Byte[6], Byte[5], Byte[4].
Byte[3], Byte[2], Byte[1], Byte[0].
The addresses for each byte lane are identically, so each memory access can only access to the bytes in the same row, and the transfer sizes can only be 8/16/32 bit to satisfy the AHB protocol.
When the processor bus interface detected that you have a word size transfer to 0x00000001, the bytes need to be accessed are byte[1], byte[2], byte[3], and byte[4]
The bus transfers generated would be:
- Byte[1] - 8-bit transfer to 0x00000001
- Byte[3] and byte[2] - 16-bit transfer to bytes 0x00000003 and 0x00000002
- Byte[4] - 8-bit transfer to 0x00000004
When the processor bus interface detected that you have a word size transfer to 0x00000003, the bytes need to be accessed are byte[3], byte[4], byte[5], and byte[6]
- Byte[3] - 8-bit transfer to 0x00000003
- Byte[5] and byte[4] - 16-bit transfer to bytes 0x00000005 and 0x00000004
- Byte[6] - 8-bit transfer to 0x00000006
When the processor bus interface detected that you have a word size transfer to 0x00000002, the bytes need to be accessed are byte[2], byte[3], byte[4], and byte[5]
Hope this is clearer.
Thank you ve much for such deep explanation. I am really thankful to you since the information was nowhere provided in detail and due to example now it is It is more clear to me. Your investment in time and energy was far beyond what I could have asked for.
Warm Regards,