This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Setting up TCM Memory in ARM926EJ-S

Note: This was originally posted on 20th October 2008 at http://forums.arm.com

Hi all,

I am currently trying to turn on TCM in ARM926EJ-S where there is 64K for ITCM, DTCM and internal SRAM.
I have decided to use it as:
32K ITCM
16K DTCM
16K SRAM

Therefore I thought I can use the memory from 0x30000 to 0x304000 no matter what, and set my stack pointer and exception+interrupt vectors to this location.
But as soon as I turn on TCM, the data in this memory is gone.
Does this make sense, or am I doing something wrong?

Here is how I do my TCM initialization:
HW_UINT32  dtcm, itcm;
HW_UINT32  enable_memory_sharing;

#define   HW_TCM_ENABLE  0x1
#define   HW_ITCM_SIZE_32K   6
#define   HW_DTCM_SIZE_16K  5

  enable_memory_sharing = 0;

 
  __asm__ __volatile__("MRC p15, 0, %0, c9, c1, 0":"=r"(dtcm));
  dtcm  |= ( 0x10104000 | (HW_DTCM_SIZE_16K << 2) | HW_TCM_ENABLE );
  __asm__ __volatile__("MCR p15, 0, %0, c9, c1, 0"::"r"(dtcm));

  enable_memory_sharing |= AT91C_CCFG_DTCM_SIZE_16KB;

  __asm__ __volatile__("MRC p15, 0, %0, c9, c1, 1":"=r"(itcm));
  itcm  |= ( 0x10108000 | (HW_ITCM_SIZE_32K << 2) | HW_TCM_ENABLE );
  __asm__ __volatile__("MCR p15, 0, %0, c9, c1, 1"::"r"(itcm));

  enable_memory_sharing  |= AT91C_CCFG_ITCM_SIZE_32KB;
 
  AT91C_BASE_CCFG->CCFG_TCMR  = enable_memory_sharing;


Regards,
Bekir
  • Note: This was originally posted on 22nd October 2008 at http://forums.arm.com

    Hi all,

    I got it finally working but the performance values say that I am doing something wrong.
    Maybe you guys can give me a hint.

    I am setting my instruction TCM Base as 0x10108000.
    Would it make sense to have a performance difference between these two accesses?:
    1- via 0x100000
    2- via 0x0x10108000

    The reason I am trying to use TCM is the fact that it has the same speed as the caches. But somehow, my code (which is in ITCM) becomes faster, when I turn on the cache at 0x100000.
    Is this in anyways logical?

    Regards,
    Bekir
  • Note: This was originally posted on 20th October 2008 at http://forums.arm.com

    Why do you use dtcm "|=" for the configuration. I'm not sure what the reset values of the TCM configuration registers are - but you are creating a new configuration rather than updating a new one. I think you can get rid of the MRC, do "dtcm =", rather than or'ing the new data with the old configuration.

    Same comment applies to the ITCM.


    You are definitely right, they were totally unnecessary.
    But it does not work either way :D
    Having 0x10104000 as base address for data tcm, I thought I should be able to see the same content in 0x200000 (internal data tcm memory) and in 0x10104000 (the given base address).
    But strangely, what I am seeing in 0x200000 is the same content as in 0x0. I have checked the content of c9, c1 and it seems to be the right address (0x10104000). I don't understand this behaviour.. Is there anything missing ?

    Regards,
    Bekir
  • Note: This was originally posted on 22nd October 2008 at http://forums.arm.com

    TCM is only the same speed as cache *if* the SRAM provided in the ASIC is single cycle access, zero wait-state memory.

    If the design is using the bulk SRAM to provide the TCM, there is no reason why the TCM should suddely be faster than the SRAM - the physical RAM hasn't changed. If the SRAM is (for example) three cycle access, I'd expect TCM to be three cycle access.

    Could this mean that the TCM is slower than the cache - yes certainly.
  • Note: This was originally posted on 25th December 2008 at http://forums.arm.com

    Recently, I face the same problem.

    My question is that where should I put the TCM initialization code.

    In linux kernel or in ordinary user space code?

    Can anyone give me a tip?

    Thank you in advance.

    TCM is only the same speed as cache *if* the SRAM provided in the ASIC is single cycle access, zero wait-state memory.

    If the design is using the bulk SRAM to provide the TCM, there is no reason why the TCM should suddely be faster than the SRAM - the physical RAM hasn't changed. If the SRAM is (for example) three cycle access, I'd expect TCM to be three cycle access.

    Could this mean that the TCM is slower than the cache - yes certainly.