Memory corruption in mov R4, 0; add R1, SP, #16; STB R4,[R1,#-1]! on Cortex-A9 with cache enabled, many interrupts EMAC, Timers after hours run. Not seeing in errata. R1 is not decremented and results corrupted stack data abort. Debugger not factor.
I do not see anything like this in the errata for the Cortex-A9. This is a Zynq Z7020 running 666Mhz with dual cores and both levels of cache running. The compiler is GCC but the fact that this runs for many hours with this code hit millions of times makes me worried. The value at R0 is the top of the local stack frame and the final byte is a byte placed on the stack with the address taken at say 0x12345F but since it is not decremented it points to a register value pushed onto the stack at 0x123460 and the method call then writes the byte data back to the stored register value at 0x1234560 and corrupts it causing a data abort. Nothing else is corrupted just that one byte.
Did you try to set a watch point to this address. Maybe set it when entering the routine and disable it before leaving.
Hi timholt,
I see the Z7020 has support for ETB and PTM.
If you are able to capture a trace of the last instructions just before the error, this might give some hints.
Not a known issue that is not in the Errata then. We are putting in trace pinouts now in the FPGA routing to look at it deeper. Instructions, location of corruption, single byte corruption and the surrounding code point strongly to a glitch in the system such that the byte store is not at the updated register location. As long is it is not known to be a vulnerable instruction given the Cortex-A9 architecture. I will leave this until we have tracing going and then close it. Thanks so much for your input.
I will let you know when we run trace on this. We have a setup for this but have just recenly solved a hardware problem with it. For now I simply supply a larger buffer to the the routine and we no longer get corruption.