I have SPI interrupt defined as FIQ and the execution time for it should be less than 12us, but adding a few checks and the execution time is going out of the limit. I've seen that it takes 2-3us for the ARM7TDMI to enter the FIQ. Can you give some ideas how to optimize it to enter it quickly or execute quickly? What should I use/avoid? I realized I should avoid using % because it takes about 3us to compute a%100! Any advice is welcome.
Only if you make it so!
It's entirely up to you whether you implement it in such a way, and provide adequate documentation, to make it maintainable!
But, of course, getting absolute maximum speed is almost always going to require "clever tricks" in the code - so the burden is on you to document them very clearly and completely.