This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

weird issue in arm code called by C function

Parents
  • Note: This was originally posted on 20th March 2013 at http://forums.arm.com

    Lots of things could be happening here. Can't really say what all it could be without knowing your processor and your OS, but here are some possibilities:

    1) The function address isn't in the BTB yet so jumping to it causes a branch misprediction (~8-13 cycles)
    2) The code isn't in L1 icache so causes a miss to L2 cache (~12-25 cycles)
    3) The code isn't in L2 cache so causes a miss to main memory (could be dozens to hundreds of cycles)
    4) The code region isn't in the ITLB so causes a miss in the main TLB (~5-10 cycles)
    5) The code region isn't in the main TLB so it causes a page walk (a few dozen cycles)
    6) The page tables aren't in cache, needs to fetch from main memory, could involve two completely different memory locations (potentially hundreds of cycles)
    7) The code isn't even in main memory and causes a load from disk/flash/whatever. It's actually common OS procedure to not page in data until it's used. (could vary wildly, anywhere from thousands to hundreds of thousands of cycles)

    Would also have to know how long that memmove actually takes in order to get a feel for the comparison you made. Are you sure that it's being performed and not optimized out by the compiler since you don't actually use the results for anything?
Reply
  • Note: This was originally posted on 20th March 2013 at http://forums.arm.com

    Lots of things could be happening here. Can't really say what all it could be without knowing your processor and your OS, but here are some possibilities:

    1) The function address isn't in the BTB yet so jumping to it causes a branch misprediction (~8-13 cycles)
    2) The code isn't in L1 icache so causes a miss to L2 cache (~12-25 cycles)
    3) The code isn't in L2 cache so causes a miss to main memory (could be dozens to hundreds of cycles)
    4) The code region isn't in the ITLB so causes a miss in the main TLB (~5-10 cycles)
    5) The code region isn't in the main TLB so it causes a page walk (a few dozen cycles)
    6) The page tables aren't in cache, needs to fetch from main memory, could involve two completely different memory locations (potentially hundreds of cycles)
    7) The code isn't even in main memory and causes a load from disk/flash/whatever. It's actually common OS procedure to not page in data until it's used. (could vary wildly, anywhere from thousands to hundreds of thousands of cycles)

    Would also have to know how long that memmove actually takes in order to get a feel for the comparison you made. Are you sure that it's being performed and not optimized out by the compiler since you don't actually use the results for anything?
Children
No data