We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Has anyone used the __pld() instruction intrinsic with the Keil compiler tools?
According to the ARM V7-M Architecture Reference Manual: "The effect of these memory system hints is IMPLEMENTATION DEFINED".
Please. No 'what are you using this for' questions. Right now this is just a research on my part to determine whether or not I would need to use this intrinsic. Since this is implementation defined I would need to know if this is even used within the Keil environment and what would be the benefit (if any) over using this rather than say optimization level 3 or even using the volatile keyword.
Thanks.
You should probably read the relevant compiler reference manual section as well: www.keil.com/.../armccref_cjagadac.htm
Since this is implementation defined I would need to know if this is even used within the Keil environment
If you use it explicitly, it will be used. I doubt the compiler emits such an instruction, though.
and what would be the benefit (if any) over using this rather than say optimization level 3 or even using the volatile keyword.
Some processors are equipped with data caches and prefetch units. If you are working with one of those, this intrinsic might help you optimize memory reads in your program. This will probably be completely opaque to the compiler, so no intersection with optimization level or use of volatile.
Thanks Mike.
According to your link: "This intrinsic inserts a data prefetch...".
Yet the ARM manual states: "Memory hints allow you to provide advance information to memory systems about future memory accesses, without actually loading or storing any data."
So the Keil compiler actually executes a prefetch (actually loads or stores) the data when __pld() is used?
Not having used this feature, I cannot say anything about it. But I am curious though: will you share with us the circumstances where this might be beneficial for you? I'm just asking out of curiosity!
From the ARM manual:
"For example: In systems with a DMA that reads memory locations that are held in the data cache of a processor, a breakdown of coherency occurs when the processor has written new data in the data cache, but the DMA reads the old data held in memory. In a Harvard architecture of caches, a breakdown of coherency occurs when new instruction data has been written into the data cache and/or to memory, but the instruction cache still contains the old instruction data."
Just trying to understand myself.
Don't take it too literally. This intrinsic inserts the PLD instruction. What this instruction does should be explained in the microprocessor manual. That's it. Operation of a processor data cache shouldn't be specific to ARM. Google around for theory of operation.
It can be used to cut (or actually overlap) memory read waitstates by having the information potentially moved from RAM or flash into the cache. But there isn't any load into a register. I don't even think you are guaranteed to get any prefetch - I think the processor may just as well ignore the hint if it has enough other things to be busy with.
If you do a matrix multiply, you could give the processor a tip about your need of the next matrix row of data a number of clock cycles before the instructions that actually starts to use them.
Another thing is that a prefetch hint could - for some processors - also inform other processor execution units that dirty data in their caches has to be prioritized for flushing to make sure that this execution unit doesn't have to stall waiting for cache synchronization.
"Normal" code should normally not have to bother with such things, but the locking primitives for thread-safe code must normally use this kind of primitives as memory access barriers. This is also why normal volatile declarations are not always working when having DMA or interrupts and main loops accessing the same data structures. Volatile cares about forced memory accesses, but not about cache coherence.
Mike and Per, Thanks for your thoughtful insights.
So, the bottom line from your responses tells me that while I can influence the potential operation of the compiler regarding the data cache, I really have no mechanism (instruction intrinsic capability) to directly control the processing of data unless I want to get down into a kernel level type control.
If I would want this control, it would be at the application level. That seems to not to be an option with this intrinsic. Are there any intrinsics you are aware of that can provide a more direct control of the data cache if required?
I'm not sure I agree with your summary.
I can influence the potential operation of the compiler regarding the data cache
If there is a data cache and if the instruction PLD is implemented in your processor, then yes, you can.
I really have no mechanism (instruction intrinsic capability) to directly control the processing of data
Wrong. The C programming language gives you plenty of ways to control processing of data: variables, pointers, operators and so on.
unless I want to get down into a kernel level type control
What kernel? An OS kernel? Which OS? How is this related to the PLD instruction?
Are there any intrinsics you are aware of that can provide a more direct control of the data cache if required?
You probably mean 'instructions' rather than intrinsics. For those you'll have to go to the arm.com website and download the reference manual for your processor.
OK Mike, Thanks. I clearly have alot more to research on this subject.
In case anyone else is interested, here is a pretty good chapter on the subject. While not specifically for ARM, it still provides an in-depth explanation on the subject of prefetching:
download.intel.com/.../excerpt_swcb1.pdf