This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vector Table Offset LPC1788 (VTOR) - NOR Flash

Hello,

I'm using uVision 4.20 and an LPC1788 development board.

The development board has external NOR flash at CS0 (0x80000000). I am able to download code to the external NOR flash using and INI setup file but I can't seem to write the VTOR register with 0x80000000.

The exact code and setup modified to run from iRAM works without issue (@ 0x10000000).

Does anyone have any ideas why I cant write VTOR with 0x80000000?

(Note the code runs until there is any interrupt and than crashes)

Thanks.

Mac

Parents
  • Vector Table can be used only for Code & SRAM region in Cortex-M cores (addresses 0x00000000 .. 0x3FFFFFFF).

    Bits 31..30 in VTOR register are not used and writing 0x80000000 will NOT set VTOR to 0x80000000.

    So you can't put the vector table in External memory.

Reply
  • Vector Table can be used only for Code & SRAM region in Cortex-M cores (addresses 0x00000000 .. 0x3FFFFFFF).

    Bits 31..30 in VTOR register are not used and writing 0x80000000 will NOT set VTOR to 0x80000000.

    So you can't put the vector table in External memory.

Children
  • Thanks for you reply.

    That's what I was afraid of.

    What is the general strategy then when the user code is larger than the devices (LPC1788) internal flash.

    Do I need to split my user application binary into two (using scatter file) so that I can load one into iRAM and the other into SDRAM?

    Basically I have a setup on an LPC2478 where I copy the entire user application into SDRAM and remap the ARM7 vectors using the MEMMAP register of that device.

    I guess there is no way of doing this same strategy on the LPC1788 so I'm wondering what do people do normally?

    Thanks.

    M

  • I fail to see the problem. Why not have the vector table in the internal flash or the internal RAM?

  • Do you really want to _physically_ separate the vector table from the rest of the program? what if the NOR flash needs to be replaced (assuming it does not carry code)?

  • what if the NOR flash is somehow deleted (assuming it does not carry code)? You're 100% toast, as is your program...

  • I guess that is what I'm asking.

    What is the best strategy. I have a bootloader that lives in internal flash. The bootloader (if not updating firmware) loads the user application from external flash into SDRAM.

    It loads the user application into SDRAM because it is too big to load into iRAM.

    I guess I have to split my user application into 2 binaries one with the vector table that loads into iRAM and the other that loads into SDRAM. Do this sound right?

    Thanks.

    M

  • Why do you copy the application into RAM in the first place? It might surprise you, but as far as I can recall, on a LPC1788 that might actually slow your program (because of architectural considerations - the chip cannot take advantage of its Harvard architecture if not executing from internal flash). Why not run from internal flash? It is too small?

  • I believe you are correct execution is fastest out or internal flash, than iRAM, than SDRAM than external flash.

    I can not use the internal flash because 1) it is too small and 2) I have a bootloader that already lives there.

  • I have a similar situation with a LPC2478. The program for that product is organized in the following way:
    1. bootloader(s) zone at the first 16[KB] of internal flash.
    2. program components until the end of internal flash.
    3. The rest of the program inside a NOR flash.

    Note that you would need to:
    1. Configure uv4 to program the NOR flash so you can debug.
    2. Make sure your bootloader can program the application parts that reside in NOR flash.
    3. Adjust your scatter loading files to house a separate load region for the NOR flash.
    4. Create a small PC application that stitches the load regions binaries generated by the linker together so that the bootloader can program it seamlessly.

  • TM>It might surprise you, but as far as I can recall, on a LPC1788 that might actually slow your program (because of architectural considerations - the chip cannot take advantage of its Harvard architecture if not executing from internal flash).

    Incorrect.

    LPC17xx on-chip RAM mapped at 0x20XXXXXX (AHB SRAM) is connected to System Bus but on-chip RAM mapped at 0x10XXXXXX (Local SRAM) is connected to I-Code & D-Code Bus.

    Running code from Local SRAM will be fast (chip can take advantage of the Harvard architecture).

  • Recently, I am developing something for the Fujitsu F2MC-16LX and Microchip PIC16 platforms.

    It is interesting that:

    Most PIC MCUs are Harvard architecture MCUs; but the newest PIC32 (based on MIPS32 M4K Core) has a Von Neumann architecture.

    Whereas ARM7 MCUs are Von Neumann architecture MCUs; but the newer Coretex-M3 has a Harvard architecture.

    Different trends, with different reasons that I don't know/understand.

  • Harvard and von Neumann are basically teoretical concepts that only exists for older processors. Strictly following these two architectures leads to much too many problems in real world situations.

    In reality, most processors wants to be von Neumann (to be general) but with internal Harvard optimizations for concurrency.

    On one hand, RAM is normally faster than flash, which is a reason why faster processors normally always runs the programs from RAM and only have flash as a kind of disk storage. Or why there are special flash acceleration modules caching the flash accesses using a very wide flash access bus much wider than the normal instruction size. But if both data and code are stored in RAM, the step is very close to allowing any RAM to be used for both code and data, allowing the developer to decide how to best make use of the chip.

    As the speeds goes up, the chips needs special optimizations. So most faster chips needs caches between processor core and memory. And with more advanced pipelines, it gets advantageous to extend the ability to concurrent access by having separate code and data caches. And then the caches will work better if the processor can have one bus for filling the data cache and another bus for filling the code cache.

    Multi-port memories are very expensive and large. And with single-port memory, all accesses must be strictly serialized. So it is advantageous to split the memory into multiple blocks, and have one access bus for each memory region, and then try to add a multiplexer that allows switching of accesses between core caches and peripherials to the different memory regions.

    In the end, we get a situation where only small or old processors will be either von Neumann or Harvard, while all newer and faster processors just have to have be combinations.

    The PC have always had a unified memory. But internally, the x86 processors got separate code and data caches many years ago. The PIC chips have had separate memoriy regions just because it have been natural to separate the read-only code memory from the read/write RAM. But when the flash isn't fast enough, or the chips gets enough RAM that customers wants to be able to download software modules dynamically into RAM and run, it gets contraproductive with strict "DATA" memory.

    Another thing here is that 8-bit processors have room for very small OP-codes (and often originated at a time when the transistor count had to be kept very low). So the OP-codes are very specialized, where the separation between "code" and "data" opcodes tells what subsets of memory addressing that will be performed. A move to 32-bit processors means that the OP-codes can have enough bits to specify generic memory addressing - a processor may not just be able to load a register with data relative to another register + offset or relative to two registers, but may just as well be able to perform a code jump relative PC + register + offset.

    In the end, we get general-purpose beasts, but with specialized hardware hidden inside. And the developer can get just about anything to function, but can get the chips to run more efficient when running programs that can take advantages of the internal optimizations. So we can place some variables on one RAM region for primary access by the processor core, while having other variables in a second RAM region where a DMA channel for ethernet or USB may perform accesses with little interference with the interactions the core at the same time makes with the first RAM region.

    In the end, what we see is really the same trend. We get amalgamated processors where the manufacturer may randomly decide to call their design Von Neumann or Harvard but the real design really is both or neither.

    And the trend isn't new. We have for a long time had something called "Modified Harvard", where the code memory have allowed data access. All because the Harvard architecture is too much an academic construct - clean and elegant but not practical. And the Von Neumann architecture is practical but raw and brute force with scaling problems.

  • Hi Per,

    Many Thanks for your detailed and enlightening elaboration.

    I heard that, "the trend of MIPS is multi-thread, whereas the trend of ARM is multi-core, but they both learn something from each other."

    Just as what you said,

    "In the end, we get general-purpose beasts, but with specialized hardware hidden inside."

    "In the end, what we see is really the same trend. We get amalgamated processors where the manufacturer may randomly decide to call their design Von Neumann or Harvard but the real design really is both or neither."

  • Thanks for all the info.

    So specifically for the LPC1788 what kind of performance variations could one expect when running code from SDRAM versus running from internal RAM (connected to I & D-Code buses).

    Thanks.

    Mac