Support forums

Architectures and Processors forum Load / Store timings with different cache settings

State Suggested Answer
Locked Locked
Replies 2 replies
Answers 1 answer
Subscribers 352 subscribers
Views 5259 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Load / Store timings with different cache settings

superdesk over 7 years ago

Hello,

I am timing load and store instructions for baremetal program by stepping though execution using OpenOCD and using the PMU cycle counter with single cycle granularity. I am running the program on a single core of a Cortex-A9 on a Xilinix Zynq-7000 (I have a Zybo board).

I have tried several different cache configurations, and am now trying to make sense of the results.

First: All caches enabled

This histogram is showing that the vast majority of LDR and STR instructions took from 10 to 15 cycles (14 cycles from the raw data). I see this and think: Okay, it takes about 14 cycles for a L1 Cache hit.

Then I ran with the L1 caches disabled (so only the L2-cache):

Now a bunch of accesses have shifted to the right, taking ~35 cycles. Maybe that is how long it takes for an L2 hit? But why are there still so many ~14 cycle accesses (from the raw data, these are both load and store instructions).

This seems weird to me, so I dig through the docs and try to turn off features like pre-fetching and branch speculation... but that doesn't get rid of the 14 cycle accesses.

Any ideas as to what is going on? Is my "14 cycles = an L1 hit" assumption incorrect? Are there other cache options I should try turning off? Is my method of getting the instruction's cycle count flawed? 14 cycles seems like a lot to me, but I am assuming that this is a result of stepping through the program clearing the pipeline (instructions like ADD also take ~14 cycles).

Thank you in advance for your help!

Top replies

Martin Weidmann over 7 years ago +2 suggested

I have some concerns about your approach. I've not used OpenOCD before and don't know how it handles stepping, however if it's similar to other debuggers the process of stepping could be quite invasive...