We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
We've been working on a project that uses RTX and the RTL CAN drivers for a while. Recently we made some changes to the code, and since then we've been seeing a Data Abort occur in the CAN_ISR interrupt handler. What is odd is that none of the CAN-related code was touched during these changes.
The CAN drivers allocate a memory pool, CAN_mpool, which is initialized as part of CAN_init(). This appears to work with no problems. We begin to receive CAN packets, and I can watch the ISR return a pointer from _alloc_box() each time.
Now here's the strange part: if CAN_mpool fills up, instead of returning NULL as it should, _alloc_box() is returning 0x000003FF. The next call CAN_ISR() makes is CAN_hw_rd(1, ptrmsg), and the first attempt to write to the pointer location understandably causes a Data Abort. (I should note that a NULL would normally cause a Data Abort anyway, since CAN_ISR() never checks to make sure the _alloc_box() call succeeds.)
I don't understand how this could be happening, and am wondering if anybody out there has any suggestions. Below is the portion of CAN_ISR we have modified to catch the unexplained 0x3FF (as well as fix a few other problems that caused the system to lock up in the case of a buffer overflow). Note that this code was working just fine until recently, but a code diff between a working version and a crashing version does not show any changes that should affect CAN communications in any way.
/* If message is received and if mailbox isn't full read message from hardware and send it to message queue */ #if USE_CAN_CTRL1 == 1 if (CAN1GSR & 0x01) { if (isr_mbx_check (MBX_rx_ctrl[0]) > 0) { ptrmsg = _alloc_box (CAN_mpool); if (ptrmsg != NULL) { /* GSL DEBUG: Check to see if msg is truly within the memory pool limits. */ if (((U32*)ptrmsg < CAN_mpool) || ((U32*)ptrmsg > (CAN_mpool + CAN_CTRL_MAX_NUM*(CAN_No_SendObjects+CAN_No_ReceiveObjects)*(sizeof(CAN_msg)/4) + 3))) { /* Breakpoint to catch odd 0x3FF return value goes here! */ __asm("nop"); } CAN_hw_rd (1, ptrmsg); CAN1CMR = 0x04; /* Release receive buffer */ isr_mbx_send (MBX_rx_ctrl[0], ptrmsg); } else { /* The CAN message pool is full. Release the receive buffer. */ CAN1CMR = 0x04; } } else { /* The required mailbox is full. Release the receive buffer, because if we do not this interrupt will just fire indefinitely and no tasks will ever execute! */ CAN1CMR = 0x04; } } #endif
Never mind -- I have found the problem. Turns out there's an array in memory immediately above CAN_mpool, and an off-by-one error was overwriting the first four bytes of CAN_mpool with -- you guessed it -- 0x000003FF.
Looks like I'd better scan the rest of the code for stupid off-by-one errors like that one...