Hi
I've run into an issue that has left me seriously scratching my head. Essentially, after moving to a new processor (with the chip shortage) I'm consistently getting a BusFault related to USB communications. First image is of the fault panel in my IDE, showing the address causing the fault being 0x0040 0008.
2. Then is the main debugger window. The callstack suggests that the instruction that caused the fault is at at 0x9340 (below the red line in the disassembly window). This looks plausible as the current value in R2 (left side pane) is indeed 0x0040 0000 and, with an offset of 8, gives us exactly the address in BFAR.
3. But how can R2 have 0x0040 0000 in it when, just in the previous instruction, at PC 0x933E, its value is loaded from the address stored in R7, with a 0 bytes offset.
4. Looking at the value in R7, R2 should actually have 0x2000 FF48, which is what the memory view shows us. When I manually stepped through the function in a working instance, that 0x0040 0000 was actually the result of the operation above (ANDS R3, R2). It's almost like the underlined instruction, loading from the address stored in R7, into R2, never got executed.
The function in which this happens is called by the USB interrupt. Processor is a MK24FN1M0VDC12.
I've been battling with this for a few days now (over 40 hours I expect) and only last night I was able to properly observe this behavior. Looking for absolutely any idea, no matter how crazy it might sound. The chips were purchased from a broker, but other than this, all other peripherals work perfectly well. Given that in the process of switching I also had to move from using HS USB to FS USB on the current MCU, I'm hoping it's actually an issue in my code and not something else.
Thanks!
If this were a problem with the CPU, then the system would experience inexplicable crashes all around and at different locations.
As a debugging step, I would first ensure that the code being run on the machine is in fact the same as that shown in the GUI; for e.g. enable the disassembler-window setting of showing the raw bytes along-side the instruction, and then view the raw memory contents of that instruction in the memory window. I suspect that the GUI code and disasm window might be pulling information not over the debugging channel but over the local disk/elf where the debugger is running. In contrast, the memory window is likely to pull the contents over the debug channel, and so it should accurately reflect the code as seen in RAM.
Another step has to do with instruction cache. Is there a chance that there was an improper handling of the instruction cache? What the CPU sees in its pipeline might not be the same as what the disasm GUI, or even the RAM, show. Is it possible to disable the instruction caching and see if the issue still occurs?
When the application is compiled, were there any warnings which were left unresolved?
The code window for USB_drv.c seems to be modified. Also, there is an expression "if (ptrUSB_HW > 0x200)" which might not compile without warning.
Another aspect is the code generated - can one compile the code (or just a single file USB_drv.c) at O3/Os or similar and see if the issue changes?
Hi! Thanks for the reply.
I did indeed had a similar issue about 5 years ago with another Kinetis K24 chip. That time it was a Bus Fault on flash writes, had to disable data caching. In any case, to answer your questions:
1. The data in the disassembly panel seems to be buffered to some extent (i.e. if I disconnect the debugger and start moving through the disassembly view, I do get an error). Unfortunately I can't find any way for it to make it load in real time, but I will check it against the memory window.
2. How would I detect improper handling of the instruction cache? I did disable it (along with data caching) and it seems to improve the situation quite a bit, but still getting those hard faults and they're still pretty often. It also doesn't happen in the area which I outlined above, but here, which is almost impossible to debug, given that there's no address information or anything similar:
3. Yes, there are quite a few warnings, I will work through them. One thing to mention is that the exact same code was running fine on the MK26FN2M0 we were previously using. It was just running HS USB, not FS USB, as I said.
4. With any sort of optimizations enabled (O1->O3), the faults become more random and occur more often.
I have did some prolonged testing and it seems I have managed to get rid of the hard faults by doing two things:
1. Disabling caching and prefetching essentially eliminated the initial hard fault that I mentioned in the first message.
2. The other hard faults seem to have been sorted simply by reducing the MCU's clock from 120Mhz to 96Mhz (making flash run at 96/5)