Best practices for managing firmware OTA updates across a fleet of Arm-based edge gateways at scale?

I've been working on an edge computing project using Arm Cortex-A based gateways, currently managing a modest fleet but planning to scale to hundreds of devices across multiple sites.

OTA updates have been straightforward at small scale, but I'm starting to think harder about failure scenarios: partial updates, rollback strategies, staged rollouts, and how to handle devices with poor connectivity mid-update.

A few things I'm genuinely uncertain about:

  • Is MCUboot the right choice for Cortex-A gateways, or is it more suited to Cortex-M endpoints?
  • How are people handling delta/incremental updates to keep bandwidth low on constrained links?
  • Any experience with canary deployments, pushing to 5% of the fleet first before a full rollout?

Curious what approaches others have landed on, especially at production scale. Are there Arm-specific tools or PSA-aligned patterns that make this more robust?