M7 atomic operation faults on non cacheable memory

I'm trying to make one region of SRAM non cacheable for DMA buffers.  But what I have found is that when I do that, the first atomic operation bus faults (eg RTOS mutex).

Here's an example where I made all of SRAM normal memory non cacheable (TEX=1 B=0 C=0 S=1) for testing:

// all SRAM non cacheable
MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
MPU_InitStruct.Number           = MPU_REGION_NUMBER0;
MPU_InitStruct.BaseAddress      = 0x20000000;
MPU_InitStruct.Size             = MPU_REGION_SIZE_1MB;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;	
HAL_MPU_ConfigRegion(&MPU_InitStruct);

HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);

It does not matter what the atomic operation is; it faults on the `ldrex` instruction.  I've checked alignment and the access is on a 32 bit boundary.  Actual memory location can be in DTCM or SRAM1 and it will fault in both.

If I change the MPU region to 'normal memory non cacheable non sharable (TEX=1 B=0 C=0 S=0)' then it will not fault but it appears that caching is still enabled for that memory region.