This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[ARM926EJS] improve write miss

Note: This was originally posted on 5th October 2010 at http://forums.arm.com

Hello experts,

    The platform I am using is ARM926EJS. Cache policy is write-back and only read-allocate.
    From the profile result, the program I want to optimize has too many write misses (write buffer refill)
    Can anyone give me some guidelines or tricks to improve my program? thanks.

BR,
Stanley
Parents
  • Note: This was originally posted on 8th November 2010 at http://forums.arm.com

    > Will STM make all store write to the same write buffer entry?
    The [url="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0198e/I31031.html"]write buffer on the 926[/url] can queue up 16 data words at 4 addresses. An STR (or STRH or STRB) that misses the cache (or is uncacheable) will use one data word and one address.  An STM of N registers will use N data words and one address.

    Depending on your memory system, there may also be some benefit to using STM of 4 or 8 registers since that will allow the 926 to use [url="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0198e/Cacjgjec.html"]bursts on  the external AHB bus[/url].

    > Does write order affect the performance if the data in the cache or not in the cache?
    I think I'm going to retract my "consecutive ascending addresses" comment.  I was imagining a difference between consecutive ascending addresses and consecutive decending addresses, but I'm not sure it makes any difference, especially without write allocate (and maybe even with).  For writes that miss the cache, except for the STM comments above, I don't think it will make any difference on the 926 (since it's not merging writes).


    Thanks for your reply.
    one more question. Is there any way to preload or load the cache line where write miss is going to happen? As I know, ARM9 didn't implement preload. what should I do to load the cache line in advance with minimal cost.
Reply
  • Note: This was originally posted on 8th November 2010 at http://forums.arm.com

    > Will STM make all store write to the same write buffer entry?
    The [url="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0198e/I31031.html"]write buffer on the 926[/url] can queue up 16 data words at 4 addresses. An STR (or STRH or STRB) that misses the cache (or is uncacheable) will use one data word and one address.  An STM of N registers will use N data words and one address.

    Depending on your memory system, there may also be some benefit to using STM of 4 or 8 registers since that will allow the 926 to use [url="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0198e/Cacjgjec.html"]bursts on  the external AHB bus[/url].

    > Does write order affect the performance if the data in the cache or not in the cache?
    I think I'm going to retract my "consecutive ascending addresses" comment.  I was imagining a difference between consecutive ascending addresses and consecutive decending addresses, but I'm not sure it makes any difference, especially without write allocate (and maybe even with).  For writes that miss the cache, except for the STM comments above, I don't think it will make any difference on the 926 (since it's not merging writes).


    Thanks for your reply.
    one more question. Is there any way to preload or load the cache line where write miss is going to happen? As I know, ARM9 didn't implement preload. what should I do to load the cache line in advance with minimal cost.
Children
No data