Hi guys!
We're moving from NXP's LPC175x MCU to LPC18xx MCU. On LPC175x we use both CAN controllers making some kind of gateway between 2 CAN networks. The logic is quite simply, any frame coming from CAN0 network is transferred to CAN1 network and any frame coming from CAN1 network goes to CAN0 network. The setup is quite simple too, all IDs are accepted on both controllers (acceptance filters are not used). LPC175x devices have triple CAN Tx buffer on each controller so everything works just fine even at high network loads.
Now it appears that on LPC18xx series the CAN controllers are quite different, they gone for "message-objects" setup with message memory map. So we're trying to learn how to use them in our application. And we kinda stuck now. We got the CMSIS based CAN example coming for MCB1800 Keil board, using Keil v5 environment. Got rid of real time OS and moved few things around the CMSIS driver.
The idea is to create a RX message object using ObjectSetFilter(1, ARM_CAN_FILTER_ID_MASKABLE_ADD, ARM_CAN_STANDARD_ID(0x1FFU), 0U); We're supposing this will cover all 11bits frame IDs. And it looks like we're able to receive all ID from 1 to 1FF.
Then we're trying to recreate our triple Tx buffer by allocating 3 Tx message-objects for this purpose. Then the logic is simple, getting Rx interrupt coming from CAN0 Rx message-object, picking up frame data, setting up Rx flag to indicate the new data available then in the main loop transmit the data through CAN1 controller. Same for CAN1 controller, get the data and send all through CAN0. Main loop is just a while(1){ read the flag / send the message } for test purposes.
All this seems to work but we have some important frame loss somewhere in between. This never happened with LPC175x series MCUs. And honestly, we cannot believe that the LPC18xx with double operating clock frequency cannot move frames between both controllers fast enough! Something is definitely wrong with our approach and all this message-object setup. We're thinking the single Rx message object could be the reason but no idea how to create multiple ones with same mask. Would you, please, suggest the direction to dig?
Thanks all!
Milorad,
Thanks for this suggestion! I was working in this direction but no results so far. I've tried to split the incoming traffic into 3 different Rx buffers, just by creating some masks. Even with this and 15 Tx buffers we are still losing some frames. Way less than before but still not good. Also the mask splitting will not work in real application with tens of unknown frame IDs.
Milorad Cvjetkovic said:For example try using 3 Rx message objects with same filter setting for reception, expecting that when 1 message object is filled, reception would continue into a second one.
I'm very curious, how can you possibly do this? So far, while setting up the #1 filter and accepting all IDs the controller will definitely not use the filters #2, #3 etc (I tried :) ). And I don't think the built in FIFO can be used since it supposes to have the same ID frame but with different frame data. In my understanding, the concatenating FIFO for message-object is used when you don't have enough time to get all data from Rx buffer but this shouldn't work with different IDs. But perhaps I'm completely wrong, the user's manual has some unclear points. About the frame loss, this is not an arbitration issue. All injected frames have different IDs so we exclude the possibility having 2 CAN sources transmitting the same ID within the same network.
Best regards,
Dimitri
Hi Dimitri.
From the C_CAN documentation it seems like the only way you would be able to do that is to use the FIFO buffer concept.
Dim724 said:In my understanding, the concatenating FIFO for message-object is used when you don't have enough time to get all data from Rx buffer but this shouldn't work with different IDs.
I don;t think so, I think it is exactly for situation like yours when you do not have enough time to read data from message object but you want to still be able to receive message with ID passing same filter to another message object.
the documentation on FIFO buffer says:
the identifiers and masks (if used) of these Message Objects have to be programmed to matching values.
That means that you set multiple message objects use same filter settings and act as a FIFO.
Meaning when first message is received to message object 0 then second message will continue to be received into message object 1 until message object 0 is free again.
I thought that with 3 separate messages it would behave this way but it will not unless FIFO mode is used meaning all message objects that are a part of FIFO except last must have EoB bit cleared and last one must have EoB set.
Did you check that MsgLst gets activated in message objects meaning your message was overwritten before it was read.
Perhaps you can also do a software buffering,.
For example when message is received put it into message queue and then have a thread which would just write it to the other CAN controller message object for transmission.
Depending on the speed of your CAN communication after the message is received there is time until new identifier is received to read out the received data (if single message object is used).
What is the CAN speed in your situation?
First, try to test if you are able to receive all the data without transmit just to check that reception is working as expected then if you manage to receive all messages you see how to transmit them while still keeping the reception responsive.
I think you did not get the part about arbitration, the arbitration I was talking about is the one that happens on the bus it is not that same IDs must not clash, but it means that if two nodes try to send data at the same time the lower ID wins the arbitration and the one that tried to send message with higher ID had to back off and has to re-transmit the message at later time.
Best regards, Milorad
Hello Milorad,
Sorry for late response! So, as you suggested, I put a Rx counter to see if there are some receptions issues. And it appears the problem is somewhere else, I'm counting the reception of every frame Id transmitted by canalyzer tool and the counters are matched. Ran this test for about 10K frames of 12 different IDs transmitted with 5ms intervals on both channels and I was able to count them all. So the issue is probably on transmission side. We're using 500k CAN network with high speed transceivers. The latest CMSIS driver v1.6. To be sure that the problem isn't with our setup we ran the same load test on our previous LPC1756 hardware (please see attached pictures )
This is LPC1756 device
This is LPC1857 device
As you can see, it looks like some highest IDs (D1, E1) have more difficulties than lower ones. Not sure if this is the arbitration issue that you were talking about since B1 which is close to the lowest A1 also miss some frames. I'm probably missing something, how the message-object are distributes between both CAN controllers? According to this
there should be 32 message-objects per CAN controller. Could you confirm? In current setup (and we're still losing frames) we are using 1 Rx message-object and 30 Tx message-objects per each controller...At each transmission I'm increasing the index of Tx message-object so it takes 30 cycles to rewrite it. This should be largely enough to not lose any frame but... In your opinion, what would be the best way to deal with this?
Thanks again!
Hi Dimitri,
yes, there are 32 message objects per CAN controller.
From your investigation I suspect your problems are related to arbitration issues.
Arbitration comes into play in a way that if at exact same time you are trying to inject message onto a network another node starts transmitting message with lower ID (that is why you are seeing more problems with higher IDs) then your message will not get sent as it will lose the arbitration.
However, if automatic retransmission is enabled on the CAN controller it will try to resend the message (check DAR bit setting on your CAN controller).
You can also check if transmit part is working as expected meaning if you are pushing each new transmission into different message object because if you are writing to same one than you actually overwrite the message and if it wasn't sent it will just be lost, perhaps you have problems with this.
You can also try as with reception just sending dummy messages to see if it works reliable.
BTW, importance of the messages is insured by IDs, meaning lower IDs are of higher importance and have higher priority on the bus due to arbitration.
Hi Milorad,
I'm still trying to figure out how to improve the transmission. DAR is disabled, this is the first thing I had to check. I put the CMSIS driver aside and tried the CAN driver from LPCOpen software. While this one seems to be less sophisticated (and btw doesn't work right out of box) it dynamically allocates the message-objects for outgoing transmissions. I wanted to see how many objects would be need for this kind of setup. So I put the number of Tx message-object being used into 1st byte of CAN frame. I had to remove a weird blocking loop from transmit function (no idea why they gone this way, after each Tx they want to sit down inside the while loop until Tx success interrupt occurs, which reduces the number of useable Tx objects to 1..lol). So now while calling transmit function, the first available message object will be used. It will be freed later inside the corresponding interrupt. So I was expecting to see some changing data in first byte each Tx frame. In fact, the bigger number I saw was 4, so 3 message-objects were constantly in use and it looks like we don't need more than that. But I'm still losing frames. So this is not related to the number of Tx message-objects and apparently there is no overwriting. But one curious thing, while moving transmit function inside the Rx interrupt the things become way better.
Ok, here is how the main function looks
And this is the interrupt section (same for CAN1)
This the CAN transmission record, you can see some missed there
Nothing special there, we get the Rx flag set after receiving a new frame then we pick the data in main loop and push it back to another controller.
Now I'm moving the transmission part INSIDE the interrupt function (same for CAN1 interrupt), like this
And eventually cutting down all Tx from the main loop, don't need it since all Tx is done inside the interrupt. The main is now while(1){}
Logically the result should be the same, at this speed the main loop should not possibly slow down the transmission. But in fact, it's quite different.
No apparent frame loss there. We cannot use this method in our product since there will be some data manipulation and I wouldn't do this inside the interrupt. But I really would like to understand how this is possible. It's clearly not an arbitration issue. Any idea what it could be? Thanks again for helping us!
I think I see the reason for your problems.
You see, these CAN controllers have 2 interfaces for accessing the message objects, they are IF1 and IF2.
They are the same, and there are 2 for a reason.
Assuming CCAN_MSG_IF1 in your code function calls means it uses IF1, it means that you are using same IF from main code and interrupt routine.
This again means that when you want to start Send from the main loop it might get interrupted by a receive interrupt which depending at which point Send was interrupted would change content of the IF1 thus it would not address the same message object anymore and lead to the problems of transmit not working correctly.
You also seem to mix IFs in the IRQ you call Chip_CCAN_SetValidMsg(..., IF1, ..) and then below it you do LPC_C_CAN0->IF[1] which seems to relate to IF2.
The solution is that you use one IF from main and another from interrupt routine.
Hi Milorad!
Thanks for pointing the IF1/2 issue! Damn, I was mixing CMSIS's and NXP's drivers and absolutely forgot to check if IF[1] == CCAN_MSG_IF1. Ok, so following your advice I'm allocating the transmit functions from main() to IF2 and both Rx objets are IF1.
The interrupt part now looks like this
Do I need to clear the pending Rx interrupt? I don't see any difference with or without but don't really like those while() inside the interrupt routines.
Ran multiple tests and still loosing frames but now we're approaching the acceptable loss rate. But another issue came through. It's quite rare, that's why I didn't catch it running some 10k frames loads. In fact, sometime the transmitted ID doesn't match the received ones. I'm still trying to find the way simulate the issue but no luck so far.
It looks like this
As you can see, both Rx IDs (E8 and F8) are in reality the A8, somehow the upper bits were affected so we got A8->E8 (1 wrong bit) and A8->F8 (2 wrong bits). It seems to be ID dependent issue, I see this happening only with A8 frames. There are variations, like A8->C0. The frame data is still correct.
Another variation with different frame data (just to be sure that there is no relationship between this and frame content). On this one I ran more than 200k frames, as you can see, there are minutes between these events. Also the load is slightly reduced (5 frames per network instead of 6)
The next thing I'll try to find out if this comes from wrongly received ID (I don't believe) or something happens to it while transmitting. But if you see something wrong there please let me know! What would be the best way to inhibit the RX interrupts while transmitting? They cannot be masked on this MCU, right? I could eventually completely shutdown C_CAN0_IRQn/C_CAN1_IRQn but this seems to be extreme measure, is there any better way to deal with?
As usual, your input is very appreciated!
Regarding IF you did not completely understand.
You should use different IF inside interrupt routine and main. But not in interrupt routine itself.
For example you should only use IF2 in main and in interrupt routine you only use IF1 for handling both Rx and Tx objects.
Point is that, if Send from main gets interrupted, its IF2 content gets preserved and continues after interrupt has finished and in interrupt you only work with IF1.
Unfortunately, you do have to wait for busy in the interrupt also but you can perhaps wait in the start of the interrupt routine so after you finish with work you don't wait for busy thus some other work can be done during that time.
You can however not use an endless loop but a certain number of loops as if it were ever to get stuck there whole system would hang in the interrupt routine.
Rx interrupt is cleared by specifying CLRINTPND in CMDMSK register, so no additional action is required.
There is something else a problem regarding IDs.
First check that you are using separate IFs so this is not causing your problems.
You can try reducing bus speed and see what happens, are there still same issues with IDs or different.
Also, I suppose you have lines terminated correctly.
That's awesome, separating the IFs fixed the frame loss issue! Thank you so much!
Now we have zero loss but is still this weird behavior with extra IDs. Now we can count the frames and it appears there is also a similar issue with 0xC2 ID.
Here is how it looks, I ran it twice to see if this is reproducible and it's (but it takes about 70K frames to happen in both cases)
First run
And this is the 2nd one
What you can see from both (thanks again, now we can count them!), the 0xC2 has 1 extra frame and 0xD2 is missing one. So somehow while receiving 0xD2, the frame was translated into 0xC2 ID but with 0xD2 data (there is a ChangeCnt column which indicates the frame data changes). Applying the same logic, the 0xA8 misses 3 frames on the first run and we have 3 extra frames (1 0xE8 and 2 0xF8 both with 0xA8 data). On the second run we're missing 2 and we have 2 extra 0xF8. And the issue with 0xC2 is kind of major issue, mixing the CAN frame payloads can have some catastrophic consequences.
What's strange is that this thing happens to these particular IDs only, all other seem to be fine. Oh, yes, both busses are 120 Ohm terminated. I don't think this is setup related, if so, we should be able to see all other IDs going through the same behavior.
But now I'm thinking, our hardware is still in assembly so we're running all these tests on Keil MCB1800 evaluation board. Unfortunately, the LPC1857 installed on this board has the initial HW revision which, according to errata, has some issues with APB bus bridge peripherals. It's not specified that there could be some interactions while using both controllers simultaneously. And, by precaution, we don't use any other peripherals while doing CAN tests. Do you think this issue can be related to this HW bug?
Update. The extra frames issue seems to come from Rx side. Checking the IDs before transmitting give me the wrong ones (0xE8, 0xF8 etc). Some kind of partial overwrite in Rx message-object, is this possible?
Another thing, probably not related to this. This is from NXPs SetMsgObject function but CMSIS driver uses a similar mechanism.
My problem, while running tests this if() is never false, every time it is a standard ID and the bit 30 is never set.
For test purposes I had to use if (pMsgObj->id <= 0x7FF) but it shouldn't be like this. In my case the pMsgObj->id contains the real ID, like extended ID 0x100A00 so 0x100A00 & 0x40000000 will be always 0. Is there any additional step to mark the incoming frames as extended ID?
I think with repeated IDs you are actually having a problem with reception happening while you are starting a transmission.
I think you will have to either use FIFO or use a message queue (or some other array) for buffering received messages.
If you can receive all messages when you are only testing the reception then you have to ensure they are buffered before you try to send them otherwise some overwrite will happen as it seems is happening in your case.
I don't think this is a hardware issue.
It looks like at some point there are very short times between two messages thus new message is received before previous wan was loaded for transmission thus new data was read from new received message and then loading for transmission continues with now changed data.
About the extended frame, as you can see in the piece of code you posted it seems that bit 30 is a flag specifying the extended frame, because you need a way to differentiate standard IDs and extended IDs.
To specify extended frame your ID should have bit 30 set to 1.
Your ID 0x100A00 would be specified as 0x40100A00.
I couldn't stop thinking about how both specific IDs were affected. I can understand the eventual timing issue between Tx and Rx but why only these 2 and only on CAN2? So today I ran the same load test on out previous LPC1756 hardware. Guess what? Got the same result after 100K+ frames. Different hardware, different CAN driver architecture but same canalyzer. So I put the logic probes on both CAN channels to see what exactly is going on.
Here it is, the bottom is CAN2 the top is CAN1 (btw, do you see how ridiculously small is the timing between frames?)
As you can see, at the moment when canalyzer sends 0x0C2 from CAN2 channel, it also sends the 0x0E1 from CAN1 channel. And somehow the data is mixed, the CAN1 frame data goes to CAN2 frame data. We use a quite expensive muti-protocol tool and this is clearly a bug on their side, nothing to do with us. Same is happening to those extra IDs, this time the tool messes with ID only, I can see them physically present on the CAN bus. Once more, nothing related to CAN driver.
So my guess this is it, all works as expected.
Look, without breaking the privacy policies I'd like to reach you to express our gratitude. Without your help and your knowledge, this project would take awhile to achieve, non-working CAN was the blocking issue. Thank you very much!
The space between frames is so called 'interframe space' and it needs to be minimum 3 bits long, so, yes, very short space between.
I'm very happy you managed to solve the problem, and that I was able to help.
Don't worry about expressing your gratitude, a simple thank you is just fine :-)