This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Fast dark picture subtraction

Hi,
I'm searching for a fast way to substract a dark image from another image.
If the dark image pixel value is greater than the corresponding image pixel, the resulting image pixel should be zero. Otherwise, it should be simply substracted.

Are there special functions for doing this job?

I got an AT91SAM9260 processor and the image data is 10bit depth, laying in a 16-bit array.

Many thank,
Stefan

Parents
  • > Depending on the application, it may be possible that the dark image
    > only has a small dynamic range, in which case the LUT solution would
    > be feasible, depending on the amount of available RAM.

    True. Although the drawback of a LUT in this case is that this type of data structure has a poor cache performance. In fact it might trash cache lines occupied by samples.

    > I believe the LDM/STM instructions aren't limited to ARMv6, and

    Never said so. But USUB16/SEL are. With a cached core, LDM/STM themselves don't give much of a performance benefit unless you can process many samples per iteration.

    > processing multiple samples per iteration would be a good way to speed
    > up the first example even more even if there are no SIMD instructions,
    > since it cuts down the number of cycles spent on accessing memory and
    > the loop overhead

    The v5 example already processes two samples per iteration. I don't believe that the loop would benefit much from increasing that number. Again, assuming that the cache is enabled. You'd have to benchmark this in the function's actual context.

    Regards
    Marcus
    http://www.doulos.com/arm/

Reply
  • > Depending on the application, it may be possible that the dark image
    > only has a small dynamic range, in which case the LUT solution would
    > be feasible, depending on the amount of available RAM.

    True. Although the drawback of a LUT in this case is that this type of data structure has a poor cache performance. In fact it might trash cache lines occupied by samples.

    > I believe the LDM/STM instructions aren't limited to ARMv6, and

    Never said so. But USUB16/SEL are. With a cached core, LDM/STM themselves don't give much of a performance benefit unless you can process many samples per iteration.

    > processing multiple samples per iteration would be a good way to speed
    > up the first example even more even if there are no SIMD instructions,
    > since it cuts down the number of cycles spent on accessing memory and
    > the loop overhead

    The v5 example already processes two samples per iteration. I don't believe that the loop would benefit much from increasing that number. Again, assuming that the cache is enabled. You'd have to benchmark this in the function's actual context.

    Regards
    Marcus
    http://www.doulos.com/arm/

Children