This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problem writing DWORD xdata

I just noticed this recently... whenever I have a xdata DWORD - 32 bits - and I try to set it to a constant value (or any value I think), the assembly that gets generated seems all messed up for such a simple routine. See below. Writing a WORD (TotalSleepTime.m16.ab) works fine, but writing a DWORD (NextSleepTime), makes a call to some external routine which is excessively long and then has a series of NOPs.

    77:         TotalSleepTime.m16.ab = 0;
C:0x3347    900181   MOV      DPTR,#TotalSleepTime(0x0181)
C:0x334A    F0       MOVX     @DPTR,A
C:0x334B    A3       INC      DPTR
C:0x334C    F0       MOVX     @DPTR,A
    78:         TotalSleepTime.m16.cd = 0;
    79:
C:0x334D    900183   MOV      DPTR,#0x0183
C:0x3350    F0       MOVX     @DPTR,A
C:0x3351    A3       INC      DPTR
C:0x3352    F0       MOVX     @DPTR,A
    80:         NextSleepTime = 1;
C:0x3353    900185   MOV      DPTR,#NextSleepTime(0x0185)
C:0x3356    120FB2   LCALL    C?LSTKXDATA(C:0FB2)
C:0x3359    00       NOP
C:0x335A    00       NOP
C:0x335B    00       NOP
C:0x335C    0122     AJMP     C:3022

I noticed that something like

variable.m32.abcd = 0

would generate screwed up code like this, so I though it may have to do with the union not being accessed correctly, so I tried changing a variable just to a straight up DWORD (NextSleepTime as seen above), but the same thing happens.

Is this behavior normal? I can't follow the undocumented assembly that is called very well, but it definitely seems excessively complicated for a simple command. I just checked on a blank project with

void main(void)
{
   unsigned long xdata test;
   test = 0;
}

Same thing.

Parents

0 Jonathan Kaufmann over 18 years ago in reply to erik malund
Firstly, the DWORD terminology for a 32 bit variable comes from FX2.h which is supplied by Cypress (the processor manufacturer) for the FX2LP processor. I too thought it was weird that a "WORD" was defined as 16 bit for an 8-bit processor, but I stuck with their naming scheme anyway and is why i use a "U32UNION" structure, as used in my post, that identifies and provides easy access to each individual BYTE, or WORD, in the proper endian order and without needing fancy type-casts to get the compiler to do the most efficient thing.

I understand that the 8-bit core is extremely inefficient at working with 32-bit values - this is the only place that I even use a 32-bit C statement like this, and I'm trying to optimize all my C functions to produce the smallest amount of code (or rewriting it in assembly) b/c I am nearing my code space size. I'm interfacing to a 16-bit wireless transceiver that has 32-bit timestamps, and on the other end I'm dealing with data coming from the USB bus. The point is, I can't completely avoid 16-bit or 32-bit accesses, but mostly they are just for storing and moving data with limited math operations performed.

By "messed up," I did not mean literally b/c obviously something as common as an unsigned long immediate store operation has been tested millions of times. I just know the default implementation was a bit complicated or as I was quoted of saying, "excessively long," for something that should be done in ~10 instructions. The function call includes at least 30 instructions w/ a bunch of sub LCALLs. I'm not on my work PC, so I can't post it now - if you want to see the instructions, create an empty project and assign an unsigned long to 0.

Thank you Drew for posting the link on the meaning of the library functions. The "K" thing explains what the NOPs were doing, and now it makes more sense what the library function was doing popping data off the stack at the start of the function. The way I would expect it to be implemented is already in my initial post in the first two lines of C code converted to assembly. For example...

unsigned long test; test = 0;

CLR A MOV DPTR #test MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A

It wouldn't be much more complicated for any immediate value other than 0 as well. I can understand that if I were using 32-bit variables all over my code, using the library routine would probably save a lot of code SIZE while compromising speed, but this is the ONLY place that I ever assign a 32 variable to an immediate value. I noticed that my code size jumped at least 50 bytes when I add this call, so I looked into it to see why.

The simple solution is to just cast it to 2 16 bit values and then do the move, so its not like this is some pressing matter. I was simply looking for a good explanation for why the compiler is choosing to use this routine.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Jonathan Kaufmann over 18 years ago in reply to erik malund
Firstly, the DWORD terminology for a 32 bit variable comes from FX2.h which is supplied by Cypress (the processor manufacturer) for the FX2LP processor. I too thought it was weird that a "WORD" was defined as 16 bit for an 8-bit processor, but I stuck with their naming scheme anyway and is why i use a "U32UNION" structure, as used in my post, that identifies and provides easy access to each individual BYTE, or WORD, in the proper endian order and without needing fancy type-casts to get the compiler to do the most efficient thing.

I understand that the 8-bit core is extremely inefficient at working with 32-bit values - this is the only place that I even use a 32-bit C statement like this, and I'm trying to optimize all my C functions to produce the smallest amount of code (or rewriting it in assembly) b/c I am nearing my code space size. I'm interfacing to a 16-bit wireless transceiver that has 32-bit timestamps, and on the other end I'm dealing with data coming from the USB bus. The point is, I can't completely avoid 16-bit or 32-bit accesses, but mostly they are just for storing and moving data with limited math operations performed.

By "messed up," I did not mean literally b/c obviously something as common as an unsigned long immediate store operation has been tested millions of times. I just know the default implementation was a bit complicated or as I was quoted of saying, "excessively long," for something that should be done in ~10 instructions. The function call includes at least 30 instructions w/ a bunch of sub LCALLs. I'm not on my work PC, so I can't post it now - if you want to see the instructions, create an empty project and assign an unsigned long to 0.

Thank you Drew for posting the link on the meaning of the library functions. The "K" thing explains what the NOPs were doing, and now it makes more sense what the library function was doing popping data off the stack at the start of the function. The way I would expect it to be implemented is already in my initial post in the first two lines of C code converted to assembly. For example...

unsigned long test; test = 0;

CLR A MOV DPTR #test MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A

It wouldn't be much more complicated for any immediate value other than 0 as well. I can understand that if I were using 32-bit variables all over my code, using the library routine would probably save a lot of code SIZE while compromising speed, but this is the ONLY place that I ever assign a 32 variable to an immediate value. I noticed that my code size jumped at least 50 bytes when I add this call, so I looked into it to see why.

The simple solution is to just cast it to 2 16 bit values and then do the move, so its not like this is some pressing matter. I was simply looking for a good explanation for why the compiler is choosing to use this routine.
Cancel
Vote up 0 Vote down

Cancel

Children

0 HansBernhard Broeker over 18 years ago in reply to Jonathan Kaufmann

I can understand that if I were using 32-bit variables all over my code, using the library routine would probably save a lot of code SIZE while compromising speed, but this is the ONLY place that I ever assign a 32 variable to an immediate value.

You pretty much answered your own question right there. You presumably asked for size-optimized code, and the tools do what is the most probable to yield the smallest code, in a typical situation. And your counter-example is biased --- 0 is an untypically simple case. Once you generalize that to an arbitrary immediate value to write, code following your pattern would grow from 11 to a whopping 18 bytes, compared to the compiler's 10. Which means the compiler wins as soon as there are about 5 of these operations in the entire program.

The only insight missing is that there's no way for the compiler to guess that this is going to be the single such operation in the whole program, because the compiler doesn't usually see the whole program. Only the linker sees the whole program, but it doesn't get to decide about micro-scale code generation.
Cancel
Vote up 0 Vote down

Cancel