I am using Omap3515 (Arm Cortex A8). Enabled I-Cache, D-Cache, Branch Prediction and MMU.
I am getting a data abort, if I try to copy a frame buffer of 600KB from an external memory region to another external memory region. After the data abort, I could notice that the SDR i.e SDRAM is not accessible.
I have enabled MMU in such a way that PA=VA.
There is no issue if I copy less amount data.
And also, If I disable D-Cache then there is no abort and it works fine. But I would like to enable D-Cache for faster access.
Thanks and regards,
Gopu
vskgopu wrote: Data Fault Status Register is 0 Instruction Fault Status Register is 0x1008 Instruction Fault Address Register is 0x80437314 (this is my code region)
vskgopu wrote:
Data Fault Status Register is 0
Instruction Fault Status Register is 0x1008
Instruction Fault Address Register is 0x80437314 (this is my code region)
I'd like to point out two things here:
This means that ultimately the problem lies somewhere with the SDRC, and you should focus your attention there. Most likely the only reason why cache settings have an effect on whether or not the error occurs is because they affect the type (e.g. burst or not) and rate of requests to the SDRC.
Dear All,
Thanks a lot for your valuable responses.
I will check this, once I get the DDR.
Regards,
Note that in ARMv7 the S, TEX, C, and B bits of a translation table entry together determine the cacheability and external behaviour of a memory region. When TEX=0, the C and B bits provide ARMv4/v5-compatible behaviour:
But for normal memory regions, new encodings exist which allow you to separately specify the L1 and L2 cache policy. Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.
If the MMU is disabled, then instruction fetches behave as if the target memory region is configured as normal memory, L1/L2 cacheable (write policy irrelevant), while data accesses behave as if the target memory region is configured as strongly-ordered. This incoherency means you have to be a bit careful when the MMU is disabled, and I recommend configuring and enabling the MMU at the earllest convenient moment.
The timing measurements mostly show that terrible performance results if instruction caching is disabled (by either disabling it explicitly in the system control register, or by enabling the MMU but running code out of memory marked device or strongly-ordered). This is not very surprising.
Addendum: I've just uploaded a small example that may be helpful in constructing section translation tables. It is for the am335x but should be easily adaptable to omap3.
Hi matthijs,
I cannot agree with you.
Note that device and strongly-ordered regions (TEX=0,C=0) should only be used for peripherals, and incur a serious performance penalty. They should never be used for code execution.
How come are you so sure?
Hadn't you made any application programs?
How should we configure to set a normal memory uncached?
Best regadrs,
Yasuhiko Koumoto.
Strongly-ordered and device memory impose additional constraints compared to normal uncacheable memory, which adversely impact performance without any benefit when targeting memory. In essence, accesses to device memory behave more like remote procedure calls. For example, performing 8 sequential byte-stores to normal uncacheable memory can be (and often is) merged to become a single dword-store. To device memory, they will always remain 8 individual byte-stores on the AXI bus. If strongly-ordered, then moreover the CPU will wait for each store to complete.
Normal uncacheable memory is configured by setting TEX=100 (binary), C=0, B=0. Or, in my example C code, type_normal( nc, nc ). It should obviously also not be used for code execution, as the performance impact of running without instruction caching is really severe.
If at all practical, the use of explicit cache maintenance should be considered preferable for performance reasons. On the Cortex-A8, you can moreover use the Preload Engine to fetch data to and/or evict data from L2 cache in the background while software is performing other tasks (this is especially useful when processing data in a streaming fashion).
are there any problems other than performance issue?
If there would be no problem, the memory attribute should be left to the developer.
The SDRC should work correctly under any memory attributes.
I think that this post is not aimed how we should configure memory attribute but aimed how we could solve the Gopu's problem.
I wonder why you did mention another MMU setting example.
Most likely the only reason why cache settings have an effect on whether or not the error occurs is because they affect the type (e.g. burst or not) and rate of requests to the SDRC.
The above your comment would be possible because the SDRC had worked well under uncache attribute.
However, I think the problem has nothing with the memory type (or attribute).
Anyway, the SDRAM mode regsiter setting might be revised.
Best regards,
Hello Gopu,
what was your mode register setting for the SDRAM?
Also I would like to know the SDRAM part number.
yasuhikokoumoto wrote: are there any problems other than performance issue? If there would be no problem, the memory attribute should be left to the developer.
yasuhikokoumoto wrote:
Performance timings were being discussed. I raised the issue of memory attributes in that context.
But yes, there are more restrictions w.r.t. device and strongly-ordered memory. For example, unaligned access is forbidden (regardless of the strict alignment checking flag). More restrictions can be found by browsing the ARM architecture reference manual, for example I just found "in a VMSA implementation when any associated MMU is enabled, any multi-access instruction that loads or stores the PC must access only Normal memory. If the instruction accesses Device or Strongly-ordered memory the result is UNPREDICTABLE." Such a load is common for returning from a function, which means that putting your stack in device or strongly-ordered memory is a bad idea.
I agree, which I why I said the main issue lies there, but since I have no experience with the omap3 sdrc (only with the substantially different emif4d found in later devices) I don't immediately have any suggestions what might be the cause. Gopu said he will investigate it once his DDR memory arrives, so I will await that.
Misconfiguration of the memory typically causes data corruption rather than bus errors, since the memory controller has no way to verify the integrity of the response.
I'm sorry but I had misunderstood.
Although you had made all areas of memory space cacheable, it had been erroneous.
You should better make only ROM and RAM areas cacheable.
The below is revised your code.
The opinion which peripheral areas should be the device type may exist, but I don't recommend it because such areas become buffer-able.
write_pte: MOV r0, r2, LSR #4 CMP r0, #0x40 ;@ Internal ROM/RAM areas MOV r0, r2, LSR #8 CMPNE r0, #0x8 ;@ SDRAM area CMPNE r0, #0x9 ;@ SDRAM area CMPNE r0, #0xA ;@ SDRAM area CMPNE r0, #0xB ;@ SDRAM area MOVEQ r0, #0x0E ;@ if ROM or RAM, cacheable MOVNE r0, #0x02 ;@ if others, uncacheable (strongly ordered) ORR r0, r0, r4, LSL #0xA ORR r0, r0, r4, LSL #0xB ORR r0, r0, r2, LSL #20 STR r0, [r1] ADD r1, r1, #4 ADD r2, r2, #1 SUBS r3, r2, #4096 BNE write_pte
With this modification I anticipate your code will work successfully.
yasuhikokoumoto wrote: Although you had made all areas of memory space cacheable, it had been erroneous.
Good catch! I hadn't brought myself to read through that long assembly listing yet. (The lack of comments in the original translation table setup code also doesn't help readability) Making peripherals cacheable is definitely a recipe for disaster.
To answer your earlier question of why I posted my mmu setup example: surely you must agree that doing this in C, and using reasonably descriptive constants, is much more readable and maintainable than doing the same in assembly?
Personally, the only initialization I perform in my assembly entrypoint is the processor mode, vector base, system control register, and stack pointer. At that point I jump to C code and perform the remaining initialization there (with the occasional inline assembly where needed e.g. for accessing coprocessor registers; regrettably GCC doesn't yet have intrinsics for that like Clang does).
Device and strongly-ordered are both valid choices in my opinion, depending on the relative value placed on performance versus less worry about memory ordering issues.
(A point maybe worth mentioning is that many TI SoCs (including the omap3) also have one or more DSP cores, which do not have the concept of strongly-ordered memory at all, so any code that relies on it will be more problematic to port to those, if that desire ever arises.)
Hi,
Thanks alot for your updated code. I have checked with this code, the problem still exist.
As you have mentioned in your earlier post, the issue should be in SDRC, I will check it once I get DDR.
Thanks again,
have there been any progress?
Form the phenomena, it would be clear that SDRAM could not accept the burst accesses from the SDRC.
You should better check again the setup parameters of SDRC.
That is, what are the contesnts of the follwoing registers.
SDRC_MCFG_p 0x6D00 0080 + (0x0000 0030 * p) ( p=0 or 1)
SDRC_MR_p 0x6D00 0084 + (0x0000 0030 * p)
SDRC_EMR2_p 0x6D00 008C + (0x0000 0030 * p)
SDRC_DLLA_CTRL 0x6D00 0060
SDRC_ACTIM_CTRLA_p 0x6D00 009C + (0x0000 0028 * p)
SDRC_ACTIM_CTRLB_p 0x6D00 00A0 + (0x0000 0028 * p)
SDRC_RFR_CTRL_p 0x6D00 00A4 + (0x0000 0030 * p)
I hope this will help you.
hi,
I will check this and let you know.
Thanks a lot for your valuable inputs, I have got the DDR board, I am using OMAP3530 EVM now, in this I donot see the abort now. I get this abort only in sdr. So I guess this explains that sdr is not supporting the cache speed.
I have another problem. I am using a part of the extrernal ram as frame buffer. But I could see the data written is not always updated in that memory if MMU is enabled. Can't we use frame buffer with mmu enabled ?
although I don't understand your situation well, generally speaking, the frame buffer area should not be cacheable.
Otherwise,you should clean the L1 data cache and L2 cache after writing to the frame buffer area.