This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

weird issue in arm code called by C function

Parents
  • Note: This was originally posted on 21st March 2013 at http://forums.arm.com

    Wikipedia can give you a good overview on these topics:

    http://en.wikipedia.org/wiki/CPU_cache
    http://en.wikipedia....ookaside_buffer
    http://en.wikipedia....anch_prediction
    http://en.wikipedia....arget_predictor

    In a nutshell, these are buffers on the CPU that are used for commonly performed operations, so they could be faster vs having to access them off-chip. But when your program just starts these buffers are empty and therefore there's a big expense to first fill them.

    I don't know how iOS works exactly but there's a good chance that new executables are demand paged through the virtual memory system. This could mean that the function you're calling isn't even in memory yet. The memory will be marked in the page table as inaccessible and when the program tries to access it an exception will be raised. This will go into the OS kernel which will run a lot of code to determine that this memory is on disk and needs to be loaded, then it has to get the memory off of flash, probably via DMA, which can take so long that the OS will try to schedule other running programs to execute in the mean time. There's a good chance the OS will try to DMA a pretty large region to lower the chances of needing to do this again.

    But I really have no idea if this is the case or not.. even if it's demand paged your function may have already been loaded along with other things that were loaded. As for the other buffers I'm pretty confident they all need to be filled so you'd be looking at at least hundreds of cycles for that. Depending on how the memmove is implemented and what the CPU is like (Apple hasn't released details on how their Swift processor works so it's kind of guess work) the memmove could take as little as 100 or even 50 cycles. But if the first call to that function really takes 100 times more cycles than the memmove (like isogen says the timing setup is not too reliable) then I think it must be due to demand paging. You can get a better idea for that mechanism here:

    http://en.wikipedia....i/Demand_paging
    http://en.wikipedia..../Virtual_memory
Reply
  • Note: This was originally posted on 21st March 2013 at http://forums.arm.com

    Wikipedia can give you a good overview on these topics:

    http://en.wikipedia.org/wiki/CPU_cache
    http://en.wikipedia....ookaside_buffer
    http://en.wikipedia....anch_prediction
    http://en.wikipedia....arget_predictor

    In a nutshell, these are buffers on the CPU that are used for commonly performed operations, so they could be faster vs having to access them off-chip. But when your program just starts these buffers are empty and therefore there's a big expense to first fill them.

    I don't know how iOS works exactly but there's a good chance that new executables are demand paged through the virtual memory system. This could mean that the function you're calling isn't even in memory yet. The memory will be marked in the page table as inaccessible and when the program tries to access it an exception will be raised. This will go into the OS kernel which will run a lot of code to determine that this memory is on disk and needs to be loaded, then it has to get the memory off of flash, probably via DMA, which can take so long that the OS will try to schedule other running programs to execute in the mean time. There's a good chance the OS will try to DMA a pretty large region to lower the chances of needing to do this again.

    But I really have no idea if this is the case or not.. even if it's demand paged your function may have already been loaded along with other things that were loaded. As for the other buffers I'm pretty confident they all need to be filled so you'd be looking at at least hundreds of cycles for that. Depending on how the memmove is implemented and what the CPU is like (Apple hasn't released details on how their Swift processor works so it's kind of guess work) the memmove could take as little as 100 or even 50 cycles. But if the first call to that function really takes 100 times more cycles than the memmove (like isogen says the timing setup is not too reliable) then I think it must be due to demand paging. You can get a better idea for that mechanism here:

    http://en.wikipedia....i/Demand_paging
    http://en.wikipedia..../Virtual_memory
Children
No data