This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizing FIQ

I have SPI interrupt defined as FIQ and the execution time for it should be less than 12us, but adding a few checks and the execution time is going out of the limit. I've seen that it takes 2-3us for the ARM7TDMI to enter the FIQ.
Can you give some ideas how to optimize it to enter it quickly or execute quickly?
What should I use/avoid?
I realized I should avoid using % because it takes about 3us to compute a%100!
Any advice is welcome.

  • ARM has documentation mentioning the clock cycles required for all assembler instructions.

    Either write your code in assembler, or write the code in C and take a peek at the generated assembler output to make sure that you are familiar with the cost of different C constructs.

    Avoid thinking of instructions as taking 3us. Think of them as taking a specific number of clock cycles. Then decide what clock frequency you will have your processor core run. Knowing the required response time and the selected clock frequency will tell you the maximum number of clock cycles your FIQ may consume.

  • 12us is an extremely tight time budget - about the execution duration of a single instruction at 72[MHz]!

  • have SPI interrupt defined as FIQ and the execution time for it should be less than 12us, but adding a few checks and the execution time is going out of the limit. I've seen that it takes 2-3us for the ARM7TDMI to enter the FIQ.

    It would be interesting to know what frequency your uC is running at.

    What should I use/avoid?

    Avoid:

    * Anything that calls library functions.
    * Any function calls.
    * Multiplication, division and modulo operators (unless they are by powers of two, in which case the compiler should replace them with shifts or ANDs respectively).
    * Byte and halfword memory accesses. Use signed/unsigned ints (32 bit integers) instead.
    * Doing too much stuff inside the ISR. Consider very carefully what must be done inside the ISR, and what can be done outside the ISR.
    * Having the operating system interfere in any way with the ISR, e.g. by adding an OS-specific preamble.

    If, after all optimizations, you still cannot reach the target of 12 us, consider switching to a Cortex-M3 parts. The interrupt system and controller of the ARM7TDMI is mediocre at best (the architecture wasn't designed to be used in microcontroller applications in the first place and hence isn't designed for ultra-low latency interrupt).

  • 12us is an extremely tight time budget - about the execution duration of a single instruction at 72[MHz]!

    I think you're off by a few orders of magnitude, Tamir. ;) (or did you read ns instead of us?)

    I have a Cortex-M3 part sitting here, happily handling quite a bit of processing (8 multiply-adds) in about 2us at 64 MHz.

  • Holly smoke! It has been a VERY long day indeed! Sorry OP!

  • My uC is running at full speed - 20MHz and I must be able to work with SPI baudrates up to 1MHz at full duplex, i.e during the reception of the bytes from the Master(I'm the Slave) I must make some measurements&computations(they have to be made during SPI transfer) and respond on time. Therefore all this should be done inside SPI ISR.
    I avoid using function calls, anything that calls library functions and the OS doesn't affect in any way the execution of my ISR. I'll try using only 32bit integers but I cannot avoid multiplication.
    Unfortunately I cannot switch to another uC, because the one I use has some unique features which are needed for my application.

  • If you post the ISR code, maybe somebody will be able to suggest improvements.

  • My uC is running at full speed - 20MHz

    Hm ... what's the exact type of uC you're using? 20 MHz sounds a bit lot for an ARM7TDMI.

    Or are you using a custom piece of hardware with the processor core on it, e.g. an FPGA, ASIC or similar?

    Twelve microseconds translates into 240 processor clock cycles, some of which are overhead for entering the ISR.

    You'll probably have to either analyze the compilers assembly output (set the compiler to generate an assembly file in addition to an object file in order to see it) and either modify the assembly file to suit your needs, or write the whole ISR in assembly yourself (might be easier, as you'll know exactly what it is doing and don't have to mess with what the compiler generated).

    Are you using any loops inside the ISR? It might be worth trying to manually "unroll" them, since actual loops carry a fairly hefty overhead on the ARM architecture.

    Example:

    for(i = 8; i > 0; --i)
       {   *dest++ = *coef++ * *src++; }
    

    unrolled would be

    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    *dest++ = *coef++ * *src++;
    

    This uses more code memory, but can be signifikantly faster.

  • For 20MHz irrelevant, but if you use a speed, that require Waitstates on Flash access, it can be a good idea to move this time critical interrupt in the RAM.

  • ...it can be a good idea to move this time critical interrupt in the RAM.

    Can you explain me how to do it? I was searching in the help about it, but with no success.

  • Can you explain me how to do it? I was searching in the help about it, but with no success.

    Does your flash/rom/... actually require wait states if the processor clock is 20 MHz? Because if it does not, there is no point in moving the function to RAM.

    For the exact procedure, you will need to refer to the linker manual. You basically need to place the function in a section that is located in RAM, and which is initialized at system startup.

  • Thanks to all of you.
    I managed to lower the execution time to less than 12us. I moved the ISR in RAM and it helped a lot. I've also made some changes in the state machine I'm using(broke it into smaller states) and finally it is good.

  • It seems the Realview Compiler doesn't distinguish between IRQ and FIQ when deciding what registers to push on the stack at the beginning of the interrupt service routine. You would think it would use the FIQ shadow registers instead of pushing them on the stack, but it doesn't. It's a real pain, but if you can write your FIQ service in assembler you can do this, and/or only save the registers you use.

    It astonishes me that, at least to my knowledge, ARM hasn't addressed this. I would think they would do everything possible to make FIQ truly a FAST interrupt.

  • I guess you can write some assembly in the startup code inside the FIQ vector itself if you really need a fast response.

  • "I guess you can write some assembly in the startup code inside the FIQ vector itself if you really need a fast response."

    No need to guess. The processor supports it and the development tools support it.