The two options are available.
What is the difference in respect of "cost" "speed" and "complexity" for the two method?
When we need more than few mutexs like that, say 100, is the answer different?
Note: If you don't have Bit Band, then you can still use 4 bytes of a 32-bit word as a 'mini bitband' functionality.
There is also another option which was not mentioned: You could disable interrupts, modify the value and enable interrupts.
Disabling interrupts will of course disturb the program flow, but it will not stop timers, etc; it'll just postpone the execution of the interrupt.
The pending bits will still be set, so you will not lose any interrupt servicing.