I'm a really beginner with ARM. I write a very simple program to find the sum of three values Q,R,S and store it in the memory. However it doesn't works. Someone can show me what is my mistake. Thanks for your help.
AREA Example3, CODE, READONLY
EXPORT SystemInit
EXPORT __main
ENTRY
SystemInit
__main
LDR R1,Q ;load r1 with Q
LDR r2,R ;load r2 with R
LDR r3,S ;load r3 with S
ADD r0,r1,r2;add Q to R
ADD r0,r3;add in S
LDR r4,=Q
STR r0,[R4] ;store result in Q
;Stop B Stop
P SPACE 4 ;save one word of storage
Q DCD 2 ;create variable Q with initial value 2
R DCD 4
S DCD 5
END
Hi crocodile1985,
Your original code looks a bit like this when disassembled:
__main 0x00008000: e59f1018 .... LDR r1,Q ; [0x8020] = 0x2 0x00008004: e59f2018 . .. LDR r2,[pc,#24] ; [0x8024] = 0x4 0x00008008: e59f3018 .0.. LDR r3,[pc,#24] ; [0x8028] = 0x5 0x0000800c: e0810002 .... ADD r0,r1,r2 0x00008010: e0800003 .... ADD r0,r0,r3 0x00008014: e59f4010 .@.. LDR r4,[pc,#16] ; [0x802c] = 0x8020 0x00008018: e5840000 .... STR r0,[r4,#0] $d 0x0000801c: 00000000 .... DCD 0 Q 0x00008020: 00000002 .... DCD 2 0x00008024: 00000004 .... DCD 4 0x00008028: 00000005 .... DCD 5 0x0000802c: 00008020 ... DCD 32800
0x00008000: e59f1018 .... LDR r1,Q ; [0x8020] = 0x2
0x00008004: e59f2018 . .. LDR r2,[pc,#24] ; [0x8024] = 0x4
0x00008008: e59f3018 .0.. LDR r3,[pc,#24] ; [0x8028] = 0x5
0x0000800c: e0810002 .... ADD r0,r1,r2
0x00008010: e0800003 .... ADD r0,r0,r3
0x00008014: e59f4010 .@.. LDR r4,[pc,#16] ; [0x802c] = 0x8020
0x00008018: e5840000 .... STR r0,[r4,#0]
$d
0x0000801c: 00000000 .... DCD 0
Q
0x00008020: 00000002 .... DCD 2
0x00008024: 00000004 .... DCD 4
0x00008028: 00000005 .... DCD 5
0x0000802c: 00008020 ... DCD 32800
The simple mistake is using LDR r4,=Q which is going to instruct the assembler to use or create a literal pool to load a value to a register - in this case, the value in r4 is the value at an indirect literal pool entry to location Q. If you want to put the address of the label in a register, you should use ADR.
ADR r4, Q
This is a pseudo-instruction which will create an address as a PC-relative offset from the current position in the code.
__main 0x00008000: e59f1018 .... LDR r1,[pc,#24] ; [0x8020] = 0x2 0x00008004: e59f2018 . .. LDR r2,[pc,#24] ; [0x8024] = 0x4 0x00008008: e59f3018 .0.. LDR r3,[pc,#24] ; [0x8028] = 0x5 0x0000800c: e0810002 .... ADD r0,r1,r2 0x00008010: e0800003 .... ADD r0,r0,r3 0x00008014: e28f4004 .@.. ADR r4,{pc}+0xc ; 0x8020 0x00008018: e5840000 .... STR r0,[r4,#0] $d 0x0000801c: 00000000 .... DCD 0 0x00008020: 00000002 .... DCD 2 0x00008024: 00000004 .... DCD 4 0x00008028: 00000005 .... DCD 5
0x00008000: e59f1018 .... LDR r1,[pc,#24] ; [0x8020] = 0x2
0x00008014: e28f4004 .@.. ADR r4,{pc}+0xc ; 0x8020
As you can see now, r4 will contain the value 0x8020 which is the address of Q.
Thanks you a lot, but it's still not working. As your help, i get the good address for R4 but the sum in R0 still not stored in the Q position. Can you check it again for me.
Uhm, I think that there's nothing wrong with this instruction:
ldr r4,=Q
This could generate a literal pool, yes, but it should store the address of the 'Q' label in the literal pool, not the constant 2.
Normally, though, the assembler would change LDR R4,=Q to ADR r4,Q automatically.
What I think is the problem, is that Q is read-only memory, eg. Flash memory, and thus cannot be written.
To store in read/write memory, try pointing to an address you know is in RAM, either by using a SECTION BSS (or just BSS) directive or pointing directly to - for instance - 0x20000100 (if your microcontroller has RAM there). See the datasheet / User's Manual for your microcontroller, in order to find out more about where the RAM block(s) are.
Also, I'd recommend writing ADD r0,r0,r3 instead of ADD r0,r3 - just for clarity. Note: ADD r0,r3 is perfectly valid, but it might be confusing when you sometimes have two arguments to ADD and sometimes 3.
You are right. I found that Q is in the read only area. Thanks for your help. Hope to have your help again
Jens
I don't think that Q is stored in flash memory at execution time. The reason why it's read-only is because it's defined within a .text Section (i.e. executable code, which is always read-only. To read and write variables, you need to put them in a .data Section or, if you're feeling clever, a .bss Section.
Mike, you're right in that 'text' section does not guarantee that the code is in flash memory or read-only memory.
On a Cortex-A microcontroller, it's very likely that the 'read-only' memory is protected.
It's also very likely that an operating system is installed on a Cortex-A.
However, on most Cortex-M microcontrollers you upload the program to flash-memory, not to RAM.
Storing the code in flash-memory is especially a good idea if you need to power-cycle the chip at some point.
It's normally the easiest way to program the microcontroller.
-But storing the code in the internal RAM is also very beneficial, the code executes faster and it also extends your flash memory life.
Let's assume that the code was uploaded to RAM on a Cortex-M device. The STR would fail writing to RAM only if the RAM is protected.
This would happen if you're running an operating system on your device, but on most Cortex-M microcontrollers this is unlikely.
-So yes, it's correct to pick a DATA or BSS section (or a custom section for that matter), where you store the data.
To add into mwsealey's answer:
in Cortex A Series Programmer's Guide:
A1.46 LDR (pseudo instruction)
syntax
LDR{cond}{.w} Rt, =expr
LDR{cond}{.w} Rt, label_expr
expr is a numeric value
label_expr is a label, optionally plus or minus a numeric value
Watch out not to get confused with these two.
[EDIT]
Example:
.section .vector
.align 4
ldr pc, =_start
ldr pc, =0 @ swi
ldr pc, =0 @ prefetch_abort
ldr pc, =0 @ data_abort
ldr pc, =0 @ not_used
ldr pc, =0 @ irq
ldr pc, =0 @ fiq
.text
_start:
...
00000000 <.vector>:
0: e59ff014 ldr pc, [pc, #20] ; 1c <gdb_check_breakpoint-0x8>
4: e3a0f000 mov pc, #0
8: e3a0f000 mov pc, #0
c: e3a0f000 mov pc, #0
10: e3a0f000 mov pc, #0
14: e3a0f000 mov pc, #0
18: e3a0f000 mov pc, #0
1c: 00001f04 andeq r1, r0, r4, lsl #30
00001f04 <_start>:
1f04: e59f408c ldr r4, [pc, #140] ; 1f98 <loop$+0x88>
1f08: e59f508c ldr r5, [pc, #140] ; 1f9c <loop$+0x8c>
1f0c: e3a0641f mov r6, #520093696 ; 0x1f000000
[/EDIT]
Fascinating! So, is it the case that for the Cortex -M series only, the assembler regime of TEXT/DATA/BSS only applies if the code is running under an OS, or does the same apply for Cortex -A and -R? Does the T/D/B regime apply automatically to an MMU-based processor, even in bare-metal mode?
And, does anyone know of any good documentation on bare-metal ARM Assembler programming?
Mike - well, it's actually quite simple; you may need to look at it from the oposite side of the road.
Imagine that you have a piece of code. This code is of course just binary data.
Let's assume these binary data are placed somewhere in RAM on a generic CPU (we don't know which architecture).
There is no intelligent loader on this CPU. The only thing it can do is to store a block of data into RAM and change register values (for instance the Program Counter).
The program is then stored in RAM, and PC is set to the beginning. This simple CPU does not have any read-only memory, so you can actually write data directly into your code (this could be used for self-modifying code, but self-modifying code isn't really necessary any longer; except for on-the-fly generated code).
A Microcontroller / Microprocessor is not required to have any RAM (or Flash memory) or a Memory Protection Unit.
This means the data in RAM can be modified as you wish. Even though it's marked "executable" in your linker-script.
-But these attributes are for the linker only. They may make it to the binary file in the form of attributes, but attributes may be ignored by what we call a "bootloader"; a small piece of code, which resides in ROM of many microcontrollers.
On some systems, for instance a computer running Mac OS X, Windows or Linux, there might be a memory protection unit and a part of the operating system, which loads the code from - say - a harddisk, relocates it and executes it.
Similarly on a microcontroller such as Cortex-M, you can make a "loader" subroutine, which reads a file from a SD-card, store it in RAM and change the Program Counter to point to the beginning of the code (after you've applied any relocation offset to all address-pointers that need relocation).
Since your "loader" subroutine could write the binary data, your loaded code can also write to it, unless your loader subroutine instructs a Memory Protection Unit to protect the executable part of the code. Thus the .text section and .rodata sections may be writable in this case.
The "read-only" attributes does not really mean that the code/data is stored in a read-only memory.
A more correct way of looking at it, is to see it as "the code /data is allowed to be placed in read-only memory."
Thus ... the BSS section would not be allowed to be placed in read-only memory; there's no point in doing so. It *must* have a read/write location.
So even the following variable in C code will not be guaranteed to be "write-protected":
static const char __attribute__((section(".rodata"))) sHexTable[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
-Because what if it ends up in RAM on a device that does not have memory protection ?
However, the above C-example does help you, the programmer, to avoid writing to your lookup-table. It helps you to spot any mistakes that would result in a memory-protection error on those devices that actually place the code in read-only memory.
Again: Even if a C program "prevents" you from making these mistakes, you can send the pointer to an assembly routine, where you write to the data that the pointer points to. It may result in the data being changed, it may have no effect or it may result in an exception being triggered.
All that said: It's a good idea to split your code into .text, .data, .rodata and .bss sections as you wrote earlier.
In some cases, it may be a good idea to invent a few extra sections, for instance a ".table" section, where you would have sub-sections like ".table.pointer", ".table.word", ".table.halfword", ".table.byte", ".table.float" or ".table.wood"
-Having the table-sections will help saving padding for alignment. It may also be desirable to have some of the tables in special RAM (for instance CCMRAM on a STM32F4 device).
Uhm, I think that there's nothing wrong with this instruction: ldr r4,=Q This could generate a literal pool, yes, but it should store the address of the 'Q' label in the literal pool, not the constant 2.
Indeed, which is a waste of space, and not within the intent of the original code.
I think it is terribly documented in the Compiler & Assembler documentation, but for the edification of the OP, the net effect of the 'LDR =' pseudo-instruction is this.
LDR rD, =CONSTANT will place CONSTANT in a literal pool and load from there, alternatively if it is judged that it is possible to form it with MOV with immediates, that may be generated too.
LDR rD, [PC, #LITERALPOOL] .. LITERALPOOL DCD CONSTANT
LDR rD, [PC, #LITERALPOOL]
..
LITERALPOOL DCD CONSTANT
LDR rD, =LABEL will load the address of LABEL, but it will do it by using a literal pool, which gets generated something like this:
LDR rD, [PC, #LITERALPOOL] .. LITERALPOOL DCD LABEL
LITERALPOOL DCD LABEL
As you can see, whether you specify a constant value or a label, it is turned into a PC-relative load from a literal pool and essentially is a way to place data in a register. This bears out from the code generation.
The ADR/ADRL (pseudo-)instruction(s) are to load PC-relative addresses into a register. It requires no literal pool but it MAY generate more than one instruction.
(In actual fact, it should work either way, but it's not the right way to do it).
Of course - I didn't get it at first.
The reason that the assembler is not optimizing the ldr r4,=q into for instance MOVW + MOVT or just a single MOV, is that the address needs to be calculated by the linker. Thus the linker needs to be able to patch the address, and that is too complicated for it, if the address is split into pieces (especially if it's only an 8-bit integer).
I think that the instruction set descriptions should be changed slightly too.
The ADR "instruction" isn't really an instruction; it's a pseudo-instruction.
The LDR instruction *is* an instruction, but if specifying =value, then it's not an instruction any longer, then it's a pseudo-instruction, which may choose one or two out of several instructions. The idea is good, but perhaps it should have been renamed to something like "LDRI" or "MOVI" instead.
Technically, I think that the name "MOVW" might be confusing too; it should probably have been named "MOVH".
jensbauer wrote: Of course - I didn't get it at first. The reason that the assembler is not optimizing the ldr r4,=q into for instance MOVW + MOVT or just a single MOV, is that the address needs to be calculated by the linker. Thus the linker needs to be able to patch the address, and that is too complicated for it, if the address is split into pieces (especially if it's only an 8-bit integer). I think that the instruction set descriptions should be changed slightly too. The ADR "instruction" isn't really an instruction; it's a pseudo-instruction. The LDR instruction *is* an instruction, but if specifying =value, then it's not an instruction any longer, then it's a pseudo-instruction, which may choose one or two out of several instructions. The idea is good, but perhaps it should have been renamed to something like "LDRI" or "MOVI" instead. Technically, I think that the name "MOVW" might be confusing too; it should probably have been named "MOVH".
jensbauer wrote:
Well ADR is a pseudo-instruction, but it usually resolves to a single instruction (that is the point) of a very specific form. There are good reasons to use aliases and pseudo-instructions, but most of them have a time and a place to be useful. As long as you can guarantee you can meet the requirements, it uses less space in the code and doesn't generate literal pools so it is quite ideal - and, of course, there is no such STR= equivalent. It is essentially some form of arithmetic on the PC (r15) and anything that is doing that on the PC is decoded by disassemblers as above - you can see fromelf knows we coded in an ADR. All pseudo-instructions have a preferred disassembly, as above, but some are indistinguishable from other instructions. For example, LDR= to get an address vs. a value is not distinguishable once you get to disassembly, and therefore it is just shown a PC-relative load (i.e. it is always something close to LDR rD, [PC, #imm]) from a literal pool. Your guess is as good as anyone's what that value actually means to the code. In terms of readable output code and paucity of output, and especially to be executable without generating any kind of memory system access (as long as you are in range of arithmetic to the PC), ADR (and ADRL) is great. In terms of defining it as a pseudo-instruction, it is defined as such in the docs, however you might never notice that it isn't a real instruction -- the preferred disassembly is easy to detect, and on ARMv8 AArch64 it is an actual instruction (along with ADRP which is a 4KiB-scaled version).
This gets very, very important as the move to ARMv8 increases, since there are very few actual instructions and MOST of what you enter into an assembly file are aliases to very powerful forms of a single instruction type (the bitfield manipulation (insert, clear, etc.) and sign extension operations, for example, are all aliases to Bitfield Move (BFM). If the disassembler just showed you arbitrary combinations of BFM you wouldn't be able to read the output of your own code . Luckily the ARM architecture is not just an opcode format and some behaviours, but dictates preferred behaviours for assemblers and disassemblers as well.
LDR= is a pseudo-instruction in that it is still a LDR at the end of the day, it just allows you to use a different syntax to load values which are not immediate(ly) encodable in an easy way and implies generating a literl pool or potentially expanding to multipe instructions. This is super friendly for 16-bit Thumb code, and where you could never guarantee the positioning of a branch which is out of range (for instance to put it in a register for BX rN) or where you don't want to maintain and micromanage your own literal pool. For values you have specifically placed in a literal pool for specific reasons, ADR will give you the address so you can load or store to it (to make up for the non-existance of an equivalent store pseudo-instruction).