This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizing FIQ

I have SPI interrupt defined as FIQ and the execution time for it should be less than 12us, but adding a few checks and the execution time is going out of the limit. I've seen that it takes 2-3us for the ARM7TDMI to enter the FIQ.
Can you give some ideas how to optimize it to enter it quickly or execute quickly?
What should I use/avoid?
I realized I should avoid using % because it takes about 3us to compute a%100!
Any advice is welcome.

Parents
  • My uC is running at full speed - 20MHz and I must be able to work with SPI baudrates up to 1MHz at full duplex, i.e during the reception of the bytes from the Master(I'm the Slave) I must make some measurements&computations(they have to be made during SPI transfer) and respond on time. Therefore all this should be done inside SPI ISR.
    I avoid using function calls, anything that calls library functions and the OS doesn't affect in any way the execution of my ISR. I'll try using only 32bit integers but I cannot avoid multiplication.
    Unfortunately I cannot switch to another uC, because the one I use has some unique features which are needed for my application.

Reply
  • My uC is running at full speed - 20MHz and I must be able to work with SPI baudrates up to 1MHz at full duplex, i.e during the reception of the bytes from the Master(I'm the Slave) I must make some measurements&computations(they have to be made during SPI transfer) and respond on time. Therefore all this should be done inside SPI ISR.
    I avoid using function calls, anything that calls library functions and the OS doesn't affect in any way the execution of my ISR. I'll try using only 32bit integers but I cannot avoid multiplication.
    Unfortunately I cannot switch to another uC, because the one I use has some unique features which are needed for my application.

Children
  • If you post the ISR code, maybe somebody will be able to suggest improvements.

  • My uC is running at full speed - 20MHz

    Hm ... what's the exact type of uC you're using? 20 MHz sounds a bit lot for an ARM7TDMI.

    Or are you using a custom piece of hardware with the processor core on it, e.g. an FPGA, ASIC or similar?

    Twelve microseconds translates into 240 processor clock cycles, some of which are overhead for entering the ISR.

    You'll probably have to either analyze the compilers assembly output (set the compiler to generate an assembly file in addition to an object file in order to see it) and either modify the assembly file to suit your needs, or write the whole ISR in assembly yourself (might be easier, as you'll know exactly what it is doing and don't have to mess with what the compiler generated).

    Are you using any loops inside the ISR? It might be worth trying to manually "unroll" them, since actual loops carry a fairly hefty overhead on the ARM architecture.

    Example:

    for(i = 8; i > 0; --i)
       {   *dest++ = *coef++ * *src++; }
    

    unrolled would be

    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    

    This uses more code memory, but can be signifikantly faster.

  • For 20MHz irrelevant, but if you use a speed, that require Waitstates on Flash access, it can be a good idea to move this time critical interrupt in the RAM.

  • ...it can be a good idea to move this time critical interrupt in the RAM.

    Can you explain me how to do it? I was searching in the help about it, but with no success.

  • Can you explain me how to do it? I was searching in the help about it, but with no success.

    Does your flash/rom/... actually require wait states if the processor clock is 20 MHz? Because if it does not, there is no point in moving the function to RAM.

    For the exact procedure, you will need to refer to the linker manual. You basically need to place the function in a section that is located in RAM, and which is initialized at system startup.

  • Thanks to all of you.
    I managed to lower the execution time to less than 12us. I moved the ISR in RAM and it helped a lot. I've also made some changes in the state machine I'm using(broke it into smaller states) and finally it is good.

  • It seems the Realview Compiler doesn't distinguish between IRQ and FIQ when deciding what registers to push on the stack at the beginning of the interrupt service routine. You would think it would use the FIQ shadow registers instead of pushing them on the stack, but it doesn't. It's a real pain, but if you can write your FIQ service in assembler you can do this, and/or only save the registers you use.

    It astonishes me that, at least to my knowledge, ARM hasn't addressed this. I would think they would do everything possible to make FIQ truly a FAST interrupt.

  • I guess you can write some assembly in the startup code inside the FIQ vector itself if you really need a fast response.

  • "I guess you can write some assembly in the startup code inside the FIQ vector itself if you really need a fast response."

    No need to guess. The processor supports it and the development tools support it.

  • Having code in RAM was not the most safety solution, so my big switch was replaced by small functions for each state. Then I have a table with pointers to functions, and I have one variable pointing to the next state(as index of table). This way, the compiler isn't saving necessary registers for all the states every time, but it saves for each function if needed.
    This way the execution time is between 5us and 10us, which is much better.
    Writing the ISR in assembler is hard for me, and I suppose will be hard to maintain, so is not the preferred solution.
    And the compiler won't help me soon in the future, because Keil has not planned to fully support FIQ yet...

  • Only if you make it so!

    It's entirely up to you whether you implement it in such a way, and provide adequate documentation, to make it maintainable!

    But, of course, getting absolute maximum speed is almost always going to require "clever tricks" in the code - so the burden is on you to document them very clearly and completely.

  • "Having code in RAM was not the most safety solution"

    You mean, that a buggy software can overwrite this code? The same argument a can also say for your function pointers, I think.

  • "Having code in RAM was not the most safety solution"

    Sometimes there is no other way. Internal flash is full, code stored in NOR flash, but device access times are unacceptable. Scatter load to RAM, all is well.

  • It is not safe because of EMI. RAM is more vulnerable to EMI than Flash

  • If you really insist, code in RAM can be guarded by checksums just like data in RAM. Don't you think corrupt data in RAM can cause as much damage as corrupt code in RAM?