While reading the documentation on the MOV instruction (section A6.7.40) on the ARMv6-M architecture, I stumbled upon the following in the "Encoding T1" description: "ARMv6-M, ARMv7-M, if and both from R0-R7. Otherwise all versions of the Thumb instruction set.". I have trouble interpreting this.
My first thought was that the T1 encoding can only be used on ARMv6-M, ARMv7-M if the Rd and Rm registers are both from R0-R7, but this doesn't hold as it is perfectly possible to assemble a MOV instruction for (at least) the ARMv6-M architecture using the T1 encoding with Rm and Rd being greater than R7.
I have tried contacting the ARM support, but it wasn't very helpful at all.
The encoding T1 of A6.7.40 in doc#1 is the same encoding mov(3) of A7.1.44 in doc#2.
Call the encoding, "mov3-T1-encoding". Furthermore, if the encoding is utilized with both Rd and Rm as low registers, call it "mov3-T1-low-encoding".
Summary:
The comment of A6.7.40 in doc#1 is not about the ability of the mov3-T1-encoding to receive high registers - it can receive them even for armv4t. It is about whether or not mov3-T1-low-encoding is legit - it is not on architectures < armv6; it is on architectures >= armv6.
Some details:
For architectures < armv6, the usage of mov3-T1-low-encoding is declared unpredictable by doc#2.
For architectures >= armv6, doc#1 says that only armv6-m and armv7-m can utilize mov3-T1-low-encoding, which is consistent with the above comment about the unpredictability on architectures < armv6.
However, doc#1 is silent about how to generate mov3-T1-low-encoding. The Operation section for mov3-T1-encoding in doc#1 /seems/ to imply that one canwrite "mov r1, r6" and the mov3-T1-low-encoding will be generated. But I am wrong, at least in the case of Arm's gcc toolchain.
The doc#2 has more details on how to generate mov3-T1-low-encoding. It says that if "mov Rd, Rm" is given as source, with both Rd and Rm as low registers, theassembler should generate a flag-setting copy by emitting the encoding for "adds Rd, Rm, #0", which is different than mov3-T1-encoding/mov3-T1-low-encoding.
To actually generate mov3-T1-low-encoding as output, the source must use "cpy Rd, Rm", where cpy is a mnemonic specifically intended to be used for the purpose of generating a non-flag-setting copy between low registers. The mnemonic "cpy" is available on architectures >= armv6.
So, there are two sides to the issue: The behaviour of the cpu when encountering a mov3-T1-low-encoding, and the generation of the said encoding.
If a cpu adhering to an architecture < armv6 encountered the mov3-T1-low-encoding, the results are declared unpredictable. Thus, a toolchain is not supposed to generate mov3-T1-low-encoding when building for architectures < armv6.
Evidently, Arm's gcc toolchain adheres to this rule. It emits encoding-for-"adds r1, r6, #0" when assembling "mov r1, r6" as thumb not only for armv4 and armv5, but also for armv6, armv7 (and possibly more that I did not test). That is, mov-between-low-registers in assembler source code is always treated as flag-setting.
It emits the mov3-T1-low-encoding (which is non-flag-setting) when assembling "cpy Rd, Rm" as thumb, for any high-low combination of legal register-arguments of cpy.
For the "mov Rd, Rm" thumb instruction in the source code:
If [architecture < armv6] and [both registers are low], then generate the [flag-setting encoding for "adds Rd, Rm, #0"]
If [architecture < armv6] and [at least one register is high], then generate the [non-flag-setting mov3-T1-encoding]
If [architecture >= armv6] and [both registers are low], then generate the [flag-setting encoding for "adds Rd, Rm, #0"]
If [architecture >= armv6] and [at least one register is high], then generate the [non-flag-setting mov3-T1-encoding]
Edit: All of the above in context of thumb instructions only.