We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi guys,
I use a _global_reg var in my program to avoid save/load:
_global_var uint32_t finished_pixels;
this variable is updated in ADC interrupt.
In main, I have following code:
register uint32_t processed_pixels = 0;
finished_pixels = 0; // Initial global reg
while (processed_pixels >= finished_pixels)
{
// Process new pixel
...
}
But the armcc will optimize all code in while loop, i.e, "// Process new pixel" section.
I tried "volatile _global_var uint32_t finished_pixels;", but the compiler says it has no effect and the same thing happens;
Now I use "volatile register uint32_t processed_pixels = 0;" to avoid the trap, but it looks ugly.
Is that a compile bug?
Any bettrer solutions?
Thanks a lot!
> I use a _global_reg var in my program to avoid save/load
> I use "--global_reg=1,2" in C/C++ option to make compiler to reserve r4, r5 for global var.
Do you have any evidence that you actually need to do this? You don't say what processor you are using, or what frequency it is running at, but this is one of those optimizations which is often more hassle than it is worth.
Stacking variables on ARM is relatively painless so you are not saving much, and loss of two registers for "normal code" is likely to degrade performance significantly for algorithms with any kind of complexity. You are avoiding save and load on interrupt, but will force functions with a lot of live variables to stack a lot more aggressively during normal execution as they have fewer registers available. In particular this optimization can _force_ more save and load simply because you are giving the compiler less flexibility to schedule registers intelligently so you won't stack your two counters, but you'll end up stacking other variables instead - the end result is much the same.
If you are processing bulk pixel data I would expect the cost of saving and restoring a couple of registers' worth of data to be in the noise.
Pete
Hi peter,
Thanks for your help.
I use a STM32F302CBT6@72MHZ in my research. As I mentioned in the reply to daith, I have to save instruction to process one pixel within 23 instructions.
Mybe I should use a STM32F2 chip, but now, the F2 family doesn't have ADC with more than 4Msps@12bit resolution.
Anyway, experiments take me a lot of fun.
>If you are processing bulk pixel data I would expect the cost of saving and restoring a couple of registers' worth of data to be in the noise.
That's true.
I am now thinking of using DMA for one frame & SIMD to process batch data.
But I will have a "long" DMA interrupt handler. I don't know if it's a problem that an interrupt takes almost all the CPU time to process one frame.
Thanks.
Well yes SIMD is designed practically for work like this.
Blocking up data and doing a batch is normally a very good thing too, though one would normally try and make them as small as reasonable to avoid latency delays and to take up less space - one doesn't want chunks of work to get bigger than the cache.
I don't understand why you are saying much time would be taken up in the interrupt handler if you do what you say, wouldn't you use say three blocks in a loop and fill up one while processing another and having another free to be filled? The processing of the blocks could be interruptible.
Yes, double buffer + DMA will be a good solution.
But I wonder if STM32F302@72MHZ could process 1024 pixels within 256us.
It seams hard, but you two guys have give me useful suggestions.
Anyway, is "_global_reg(x) in condition will be optimized as the value will never change" an armcc compile bug?
Thanks again.
I'm not sure what you are saying about global_reg.
The timing looks very tight. I think it should be possible. I think it may be possible to time the getting of the ADC values with the DMA which would be nice. If you do two values at a time and pack them in halfwords I think you probably can just about do the job using the SIMD instructions.
The problem is, a _global_reg variable can't be specified as volatile.
Thus, ARMCC compiler will consider it never changes.
You could talk to them about marking it as volatile which I think is the cleanest way of saying it could change at any time, as their documentation says at the moment they don't currently support that. I would be rather loath to depend on such a facility unless the ABI and RTOS supported having a register reserved in such a way so I guess they haven't had any requests for it. If you are are careful placing your loads and stores so they don't cause holdups they an be quite cheap so if in C one had some calculations followed by testing the value of a volatille variable the load of the volatile can be moved back over the calculation.