Hi guys!
We're moving from NXP's LPC175x MCU to LPC18xx MCU. On LPC175x we use both CAN controllers making some kind of gateway between 2 CAN networks. The logic is quite simply, any frame coming from CAN0 network is transferred to CAN1 network and any frame coming from CAN1 network goes to CAN0 network. The setup is quite simple too, all IDs are accepted on both controllers (acceptance filters are not used). LPC175x devices have triple CAN Tx buffer on each controller so everything works just fine even at high network loads.
Now it appears that on LPC18xx series the CAN controllers are quite different, they gone for "message-objects" setup with message memory map. So we're trying to learn how to use them in our application. And we kinda stuck now. We got the CMSIS based CAN example coming for MCB1800 Keil board, using Keil v5 environment. Got rid of real time OS and moved few things around the CMSIS driver.
The idea is to create a RX message object using ObjectSetFilter(1, ARM_CAN_FILTER_ID_MASKABLE_ADD, ARM_CAN_STANDARD_ID(0x1FFU), 0U); We're supposing this will cover all 11bits frame IDs. And it looks like we're able to receive all ID from 1 to 1FF.
Then we're trying to recreate our triple Tx buffer by allocating 3 Tx message-objects for this purpose. Then the logic is simple, getting Rx interrupt coming from CAN0 Rx message-object, picking up frame data, setting up Rx flag to indicate the new data available then in the main loop transmit the data through CAN1 controller. Same for CAN1 controller, get the data and send all through CAN0. Main loop is just a while(1){ read the flag / send the message } for test purposes.
All this seems to work but we have some important frame loss somewhere in between. This never happened with LPC175x series MCUs. And honestly, we cannot believe that the LPC18xx with double operating clock frequency cannot move frames between both controllers fast enough! Something is definitely wrong with our approach and all this message-object setup. We're thinking the single Rx message object could be the reason but no idea how to create multiple ones with same mask. Would you, please, suggest the direction to dig?
Thanks all!
Hi Dimitri,
I think I see the reason for your problems.
You see, these CAN controllers have 2 interfaces for accessing the message objects, they are IF1 and IF2.
They are the same, and there are 2 for a reason.
Assuming CCAN_MSG_IF1 in your code function calls means it uses IF1, it means that you are using same IF from main code and interrupt routine.
This again means that when you want to start Send from the main loop it might get interrupted by a receive interrupt which depending at which point Send was interrupted would change content of the IF1 thus it would not address the same message object anymore and lead to the problems of transmit not working correctly.
You also seem to mix IFs in the IRQ you call Chip_CCAN_SetValidMsg(..., IF1, ..) and then below it you do LPC_C_CAN0->IF[1] which seems to relate to IF2.
The solution is that you use one IF from main and another from interrupt routine.
Best regards, Milorad
Hi Milorad!
Thanks for pointing the IF1/2 issue! Damn, I was mixing CMSIS's and NXP's drivers and absolutely forgot to check if IF[1] == CCAN_MSG_IF1. Ok, so following your advice I'm allocating the transmit functions from main() to IF2 and both Rx objets are IF1.
The interrupt part now looks like this
Do I need to clear the pending Rx interrupt? I don't see any difference with or without but don't really like those while() inside the interrupt routines.
Ran multiple tests and still loosing frames but now we're approaching the acceptable loss rate. But another issue came through. It's quite rare, that's why I didn't catch it running some 10k frames loads. In fact, sometime the transmitted ID doesn't match the received ones. I'm still trying to find the way simulate the issue but no luck so far.
It looks like this
As you can see, both Rx IDs (E8 and F8) are in reality the A8, somehow the upper bits were affected so we got A8->E8 (1 wrong bit) and A8->F8 (2 wrong bits). It seems to be ID dependent issue, I see this happening only with A8 frames. There are variations, like A8->C0. The frame data is still correct.
Another variation with different frame data (just to be sure that there is no relationship between this and frame content). On this one I ran more than 200k frames, as you can see, there are minutes between these events. Also the load is slightly reduced (5 frames per network instead of 6)
The next thing I'll try to find out if this comes from wrongly received ID (I don't believe) or something happens to it while transmitting. But if you see something wrong there please let me know! What would be the best way to inhibit the RX interrupts while transmitting? They cannot be masked on this MCU, right? I could eventually completely shutdown C_CAN0_IRQn/C_CAN1_IRQn but this seems to be extreme measure, is there any better way to deal with?
As usual, your input is very appreciated!
Dimitri
Hi Dimitri.
Regarding IF you did not completely understand.
You should use different IF inside interrupt routine and main. But not in interrupt routine itself.
For example you should only use IF2 in main and in interrupt routine you only use IF1 for handling both Rx and Tx objects.
Point is that, if Send from main gets interrupted, its IF2 content gets preserved and continues after interrupt has finished and in interrupt you only work with IF1.
Unfortunately, you do have to wait for busy in the interrupt also but you can perhaps wait in the start of the interrupt routine so after you finish with work you don't wait for busy thus some other work can be done during that time.
You can however not use an endless loop but a certain number of loops as if it were ever to get stuck there whole system would hang in the interrupt routine.
Rx interrupt is cleared by specifying CLRINTPND in CMDMSK register, so no additional action is required.
There is something else a problem regarding IDs.
First check that you are using separate IFs so this is not causing your problems.
You can try reducing bus speed and see what happens, are there still same issues with IDs or different.
Also, I suppose you have lines terminated correctly.
Hi Milorad,
That's awesome, separating the IFs fixed the frame loss issue! Thank you so much!
Now we have zero loss but is still this weird behavior with extra IDs. Now we can count the frames and it appears there is also a similar issue with 0xC2 ID.
Here is how it looks, I ran it twice to see if this is reproducible and it's (but it takes about 70K frames to happen in both cases)
First run
And this is the 2nd one
What you can see from both (thanks again, now we can count them!), the 0xC2 has 1 extra frame and 0xD2 is missing one. So somehow while receiving 0xD2, the frame was translated into 0xC2 ID but with 0xD2 data (there is a ChangeCnt column which indicates the frame data changes). Applying the same logic, the 0xA8 misses 3 frames on the first run and we have 3 extra frames (1 0xE8 and 2 0xF8 both with 0xA8 data). On the second run we're missing 2 and we have 2 extra 0xF8. And the issue with 0xC2 is kind of major issue, mixing the CAN frame payloads can have some catastrophic consequences.
What's strange is that this thing happens to these particular IDs only, all other seem to be fine. Oh, yes, both busses are 120 Ohm terminated. I don't think this is setup related, if so, we should be able to see all other IDs going through the same behavior.
But now I'm thinking, our hardware is still in assembly so we're running all these tests on Keil MCB1800 evaluation board. Unfortunately, the LPC1857 installed on this board has the initial HW revision which, according to errata, has some issues with APB bus bridge peripherals. It's not specified that there could be some interactions while using both controllers simultaneously. And, by precaution, we don't use any other peripherals while doing CAN tests. Do you think this issue can be related to this HW bug?
Update. The extra frames issue seems to come from Rx side. Checking the IDs before transmitting give me the wrong ones (0xE8, 0xF8 etc). Some kind of partial overwrite in Rx message-object, is this possible?
Another thing, probably not related to this. This is from NXPs SetMsgObject function but CMSIS driver uses a similar mechanism.
My problem, while running tests this if() is never false, every time it is a standard ID and the bit 30 is never set.
For test purposes I had to use if (pMsgObj->id <= 0x7FF) but it shouldn't be like this. In my case the pMsgObj->id contains the real ID, like extended ID 0x100A00 so 0x100A00 & 0x40000000 will be always 0. Is there any additional step to mark the incoming frames as extended ID?
Thanks again!
I think with repeated IDs you are actually having a problem with reception happening while you are starting a transmission.
I think you will have to either use FIFO or use a message queue (or some other array) for buffering received messages.
If you can receive all messages when you are only testing the reception then you have to ensure they are buffered before you try to send them otherwise some overwrite will happen as it seems is happening in your case.
I don't think this is a hardware issue.
It looks like at some point there are very short times between two messages thus new message is received before previous wan was loaded for transmission thus new data was read from new received message and then loading for transmission continues with now changed data.
About the extended frame, as you can see in the piece of code you posted it seems that bit 30 is a flag specifying the extended frame, because you need a way to differentiate standard IDs and extended IDs.
To specify extended frame your ID should have bit 30 set to 1.
Your ID 0x100A00 would be specified as 0x40100A00.
I couldn't stop thinking about how both specific IDs were affected. I can understand the eventual timing issue between Tx and Rx but why only these 2 and only on CAN2? So today I ran the same load test on out previous LPC1756 hardware. Guess what? Got the same result after 100K+ frames. Different hardware, different CAN driver architecture but same canalyzer. So I put the logic probes on both CAN channels to see what exactly is going on.
Here it is, the bottom is CAN2 the top is CAN1 (btw, do you see how ridiculously small is the timing between frames?)
As you can see, at the moment when canalyzer sends 0x0C2 from CAN2 channel, it also sends the 0x0E1 from CAN1 channel. And somehow the data is mixed, the CAN1 frame data goes to CAN2 frame data. We use a quite expensive muti-protocol tool and this is clearly a bug on their side, nothing to do with us. Same is happening to those extra IDs, this time the tool messes with ID only, I can see them physically present on the CAN bus. Once more, nothing related to CAN driver.
So my guess this is it, all works as expected.
Look, without breaking the privacy policies I'd like to reach you to express our gratitude. Without your help and your knowledge, this project would take awhile to achieve, non-working CAN was the blocking issue. Thank you very much!
The space between frames is so called 'interframe space' and it needs to be minimum 3 bits long, so, yes, very short space between.
I'm very happy you managed to solve the problem, and that I was able to help.
Don't worry about expressing your gratitude, a simple thank you is just fine :-)