This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

24 bit address in 16 bit processor

Please do not say 'paging' I can not handle the overhead for all the other stuff that fit nicely within 64k.

I have a 2Mbyte flash that occasionally is accessed and the time here is not critical (the system pauses) as opposed to other processes that use only RAM.

Is there an elegant way to access structures in that 24 bit address space or does it have to be bent folded and mutilated to access?

Currently all addresses are specified with an 8 bit 'page' and a 16 bit 'address' and processed as such. I could gain some readability by having the whole in a long.

Also this method require data to be stored so no structure cross a page boundary and that limitation is a nuisance.

Erik

Parents Reply Children
  • You may optimize XBANKING.A51 yourself to come up with a better solution. If you have found something better (which is thread-safe and works in all circumstances) please send your suggestion to:
    support.intl@keil.com.

  • OK clarification

    99.9% of the time 'flash does not exist' and time is EXTREMELY critical.

    then rarely a routine is called (sketched here)

    U8 ReadFlash( U32 address)
    {
    U8 byte;
      P4 = ( address >> 16) & 0xff
      SWITCH_IO_TO_FLASH
      byte = *(address &0xffff)
      SWITCH_IO_TO_RAM
      return (byte)
    }

    This works as a champ EXCEPT in a case like this
    data = ReadFlash (structx.offsety)

    where the sum of structure address and offset cross a 64k boundary.

    So what do I want?

    A method where the structure address and offset when directed to a long calculates correct, nothing else.

    Erik

  • I'm not quite sure I understand the problem.

    ReadFlash (&structx.offsety)

    should only pass in one value, which already combines the structure address + offset. And since the routine only reads one byte, that one byte can't cross a boundary. If you were reading a longer word, then you'll have to change the bank register mid-stream. If you're passing in the address and offset seperately, then you'll just have to add them before taking the address apart.

    If you use far pointers only for your flash addresses, and explicitly declare others as xdata*, then the compiler should only call the XBANKING routines for the far pointers, which is to say the flash accesses. Other, 16-bit, accesses would be performed as normal.

    The library routines seem to compare favorably to the example code with the shifts and masks, even with the bank overhead.

    LOAD_BANK	MACRO SaveAcc
    LOCAL lab
    		MOV	DPL,R1
    		MOV	DPH,R2
    
    		MOV	?C?XPAGE1SFR,R3
    		DEC	?C?XPAGE1SFR
    		ANL	?C?XPAGE1SFR,#07FH
    		CJNE	R3,#80H,lab		; test high bit of R3 to set carry
    lab:
    		ENDM
    
    RESTORE_BANK	MACRO SaveAcc
    		MOV	?C?XPAGE1SFR,#?C?XPAGE1RST ; Reset Page Register
    		ENDM
    
    
    
    ;-----------------------------------------------------------------------------
    ; CLDXPTR: Load   BYTE in A             via Address given in R1/R2/R3
    ; Registers which can be used without saving:  DPTR, CY, A
    ;
    ?C?CLDXPTR:
    		LOAD_BANK
            JNC	CLDCODE
    		MOVX	A,@DPTR
    		JMP CLDDONE
    CLDCODE:
    		CLR	A
    		MOVC	A,@A+DPTR
    CLDDONE:
    		RESTORE_BANK 1
    		RET
    
    

    If access to code space isn't a possibility, just cut out the JNC to MOVC half of the routine and eliminate the test from the LOAD_BANK macro.

    You could still shave a little time from those routines if you happened to know that you only access one segment at a time and have a series of flash accesses with no intervening xdata accesses.

    void ManualRead4 (U8 far* addr)
        {
        SetBankReg(addr);
    
        byte1 = *(xdata*)addr++;
        byte2 = *(xdata*)addr++;
        byte3 = *(xdata*)addr++;
        byte4 = *(xdata*)addr++;
    
        SetBankReg (0);  // sets to default value
        } // ManualRead4
    

    Instead of a U32, you might consider a union.

    /// provides access to words/bytes of a U32
    typedef union
        {
        U32 u32;
        U8  array[4];
        struct { MultiByte16 lsw; MultiByte16 msw; } words;
        } MultiByte32;
    

    You get a little better code from accessing MyVar.array[1] than (MyVar.u32 >> 16) & 0xff. The compiler could be better about strength reduction of shifts that are multiples of 8 bits.

  • The code in XBANKING.A51 does exactly what your ReadFlash routine does. But it is build into the compiler, so when you have pointers, you need not to decide whether the address is now a flash address or a RAM address. The overhead is this decision (which are 5-6 CPU cycles).

    Of course you may implement your own way of doing it.

  • And since the routine only reads one byte, that one byte can't cross a boundary.
    No, but the address can

    if the structure is located at fff0 and 20 bytes long, the access of the last 4 entries will be 0000, 0001, 0002 and 0003 with 16 bit calculation.

    Erik

  • The code in XBANKING.A51 does exactly what your ReadFlash routine does. But it is build into the compiler, so when you have pointers, you need not to decide whether the address is now a flash address or a RAM address. The overhead is this decision (which are 5-6 CPU cycles).

    I DO NOT want the execution routines since they assume all is 'banked' and when operating in "RAM mode" I can not afford ANY overhead.

    ALL I WANT is a means of the calculation of the effective address in 32 bit mode.

    IF the address of an entry in a structure or array is targeted at a 32 bit entity, the calculation should be 32 bit.

    Erik

  • I DO NOT want the execution routines since they assume all is 'banked' and when operating in "RAM mode" I can not afford ANY overhead.

    The XBANKING routines do not add any overhead when accessing CODE, DATA, XDATA, IDATA, PDATA, or BIT memory areas variables. They are only invoked when you use far or const far pointers.

    IF the address of an entry in a structure or array is targeted at a 32 bit entity, the calculation should be 32 bit.

    Far memory types are limited to 64K in size and may not cross a 64K boundary. As such, the address calculations for far memory objects are performed using 16-bit arithmetic which reduces code size and increases execution speed. A limitation is that compiler-managed objects may not cross a 64K boundary.

    ALL I WANT is a means of the calculation of the effective address in 32 bit mode.

    You can do this using a far pointer with a long typed index but you'll have to do it manually and you'll have to read each byte individually. However, this is only required for those objects that straddle the 64K boundary. And, there are very few of those (only 1 if you're using 128K).

    Jon

  • You can do this using a far pointer with a long typed index but you'll have to do it manually and you'll have to read each byte individually. However, this is only required for those objects that straddle the 64K boundary. And, there are very few of those (only 1 if you're using 128K).
    The problem here is that the data is variable and I do not know which units straddle.

    Anyhow, I think it has now reached the point where I have to go back to the proplr that make tha software that generate the file that I store in flash and say "make a hole in the file so no units straddle 64k" I know it will cost me a a hefty fee, but oh well if nothing else works, pay.

    Erik

  • The problem here is that the data is variable and I do not know which units straddle.

    Well, that complicates things a bit, but still, couldn't you look at the address and size of the object to determine if it straddles?

    Jon

  • Well, that complicates things a bit, but still, couldn't you look at the address and size of the object to determine if it straddles?

    There is more to it, There are 32 copies of struct a, 64 copies of array b etc.

    To process the same struct differently depending on its location would create a piece of code that would be a nightmare to debug.

    Anyhow, I'll see what that cost of requesting a gap would be and if exorbiant, I'll try the suggestions here.

    Thanks all,

    Erik

  • Sounds like the offsetof() macro could come in handy.

        if (((U16)addr + offsetof(structType, fieldName) < (U16)addr)
            { // field straddes 64k boundary
            }
        else
            { // field lies within one 64k segment
            }
    

    Assuming none of your fields are bigger than 32k, that is.

    But isn't such a test at runtime again going to be more expensive than just setting the bank register? I suppose you could figure out at initialization time where the break comes, and store that.

    Is this operation really so time-critical that saving a couple of instructions is worthwhile? Flash access is often slower than RAM access. If you're writing to the flash, it's many orders of magnitude slower than the access time.

  • Is this operation really so time-critical that saving a couple of instructions is worthwhile? Flash access is often slower than RAM access.
    when reading flash timing is of no concern, when NOT reading flash extremely so.
    Basically the unit run in two modes
    Haul @$$ (99.9% of the time)
    work with flash

    Erik

  • As Jon mentioned, the extra instructions to set up the high 8 bits of the address apply only to far and const far data. Regular xdata access does not go through these routines, and will not be slowed down by the code in XBANKING.A51. These routines are essentially the "far access library". So long as you don't declare your normal xdata items far or access them via a far pointer, you should be safe.

    The remaining question seems to be whether or not the actual access pattern is such that detecting the segment boundary is worthwhile. If there's a whole lot of 1-byte reads in the same segment, then you could (in theory) optimize out most of the high-order byte setup, as in the ReadManual4 routine I posted above. It's just a matter of whether the time and code it takes to figure out whether you need to set the high byte is less than the time it takes you just to do it every time. Also, it's perhaps worth considering whether you need consistent execution time for every access, or whether it's okay for some of them to be much longer than others as long as the amortized total is less overall.

  • Also, it's perhaps worth considering whether you need consistent execution time for every access, or whether it's okay for some of them to be much longer than others as long as the amortized total is less overall.
    varying execution time is irrelevant, but code that tries to read something by method a and something by method b will by me be considered 'messy' and outlawed.

    Again, once the flash is in the loop timing is totlly non-critical.

    I will play a bit with far and !far and see what happens.

    Erik

  • one question re banking
    can the 'main' bank be 64k all I have seen say 'home bank' 32k, bank 1 32k

    Erik