We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello,
We are encountering the core hang-up of unknown origin in our mass-produced board using TI's AM3352 and Linux Kernel 3.13.4.Regarding the reproducibility of the test, some units had the hang-up to take about 2000 hours after a system start, and others had about 24 hours at the earliest from a system start.And also the core hang-ups have occurred by 21 units out of 232 units.
Here is a trace log of ETB (Embedded Trace Buffer) acquired via JTAG (CoreSight).
0284.trace_log_20180104.zip
Trace log result summaryIt stops by just before the core hang-up with the following processing sequence.
1. Undefined instruction exception (VFP) 2. Processing of userland Process 3. Data abortion exception
The trace log is acquired by total of 5 times of core hang-up. It stops by the same processing in all trace log.
And from checking the last processing of all trace log; log_file, acquired at the hang-up, it is set the value in a system control register of CP15, and it seems to make the MPU hung state.Is there a possibility which will be in the MPU hung state by this processing?
ldr r0,0xC05E3420 ldr r0,[r0] mcr p15,0x0,r0,c1,c0,0x0; p15,0,r0,c1,c0,0 (system control)
And so, please advise us the effective way to investigate this hang-up.
Best reards,Takashi
Dear Vincent,
Thank you for your reply.
I see the code sequence of your trace in the Linux kernel code. As far as I can tell this is the `alignment_trap' macro in arch/arm/kernel/entry-header.S:
It is exactly as you say.
I don't see why this would be a problem. There are indeed 7 occurrences of this sequence in your trace and only the last one had an issue.
We also care about the execution of the VFP before "mcr p15".So it looks like a problem with coprocessors.
Did you try to enable all Cortex-A8 errata workarounds in your kernel? For example: ARM_ERRATA_430973, ARM_ERRATA_458693 and ARM_ERRATA_460075?
We are using the following chip revision.
CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7)
In other words, it will be r3p2.So, we think that these ERRATA do not apply.Even if we apply it, the version check will work as follows.
#if defined(CONFIG_ARM_ERRATA_430973) && !defined(CONFIG_ARCH_MULTIPLATFORM)
teq r5, #0x00100000 @ only present in r1p*mrceq p15, 0, r10, c1, c0, 1 @ read aux control registerorreq r10, r10, #(1 << 6) @ set IBE to 1mcreq p15, 0, r10, c1, c0, 1 @ write aux control register#endif#ifdef CONFIG_ARM_ERRATA_458693teq r6, #0x20 @ only present in r2p0mrceq p15, 0, r10, c1, c0, 1 @ read aux control registerorreq r10, r10, #(1 << 5) @ set L1NEON to 1orreq r10, r10, #(1 << 9) @ set PLDNOP to 1mcreq p15, 0, r10, c1, c0, 1 @ write aux control register#endif#ifdef CONFIG_ARM_ERRATA_460075teq r6, #0x20 @ only present in r2p0mrceq p15, 1, r10, c9, c0, 2 @ read L2 cache aux ctrl registertsteq r10, #1 << 22orreq r10, r10, #(1 << 22) @ set the Write Allocate disable bitmcreq p15, 1, r10, c9, c0, 2 @ write the L2 cache aux ctrl register#endif
Did you try to reproduce your issue with a more recent kernel?
We are testing with "ti-linux-4.9.y" brunch too.
Did you try to reproduce your issue on a different board with the same processor? I think the beaglebone black has a TI AM3358 with the same Cortex-A8 as AM3352.
This is good idea.However, there is no board as same as hardware configuration as our board.In the current situation, it is difficult.
Add information about system control register of CP15:In addition to the following commit, we tried a patch that does not read CP15 at 'alignment_trap'.
commit 195b58add463f697fb802ed55e26759094d40a54Author: Russell King <rmk+kernel@arm.linux.org.uk>Date: Thu Aug 28 13:08:14 2014 +0100
ARM: Avoid writing to control register on every exception If we are not changing the control register value, avoid writing to it.Writes to the control register can be very expensive, taking around ahundred cycles or so.
Here is a trace log when the core hang-up occurs.
0763.arm_corelock_00014_b35_20180125.zip
Trace the summary of log results just before core hang-up.
1. Undefined instruction exception (VFP)2. Processing of userland Process3. Data abortion exception(mrc p15 ...)
Best regards,Takashi