STM32F411 code only running when stepped through a debugger

Asking this here in case it is an ARM issue and not specifically an STM32 issue...

We have a project running on the STM32F411 via the standard discovery board :

www.st.com/.../32f411ediscovery.html

It has been verified that with the STLink port plugged in the BOOT0 pin is at GND and the NRST pin is at 3V.  We are running into the apparently very common issue that code which was running perfectly fine suddenly only works in debug mode.  Code has been added to light up LD3 (the orange LED) which connects to pin PD13 on the processor.  By placing this code as high up the execution chain as possible we see that the device hangs almost immediately as soon as the clock configuration code is executing.  The code running is now as follows:

// from stm32.h
/* modify bitfield */
#define _BMD(reg, msk, val)     (reg) = (((reg) & ~(msk)) | (val))
/* set bitfield */
#define _BST(reg, bits)         (reg) = ((reg) | (bits))
/* clear bitfield */
#define _BCL(reg, bits)         (reg) = ((reg) & ~(bits))
/* wait until bitfield set */
#define _WBS(reg, bits)         while(((reg) & (bits)) == 0)
/* wait until bitfield clear */
#define _WBC(reg, bits)         while(((reg) & (bits)) != 0)
/* wait for bitfield value */
#define _WVL(reg, msk, val)     while(((reg) & (msk)) != (val))
/* bit value */
#define _BV(bit)                (0x01 << (bit))

Our code:

void LD3_Init(void) {
    _BST(RCC->AHB1ENR, RCC_AHB1ENR_GPIODEN);
    _BMD(GPIOD->MODER, (0x03 << 26), (0x01 << 26));
    _BCL(GPIOD->OTYPER, (0x01 << 13));

} // end LD3_Init

void LD3_ON(void) {
    _BST(GPIOD->ODR, (0x01 << 13));

} // end LD3_ON

int main (void) {
    volatile uint32_t reg_value;

    // Original source assumes default RCC register setting of 0x0000_XX81
    //  which enables the HSI clock, but better perhaps to explicitly set it
    _BCL(RCC->CR, RCC_CR_PLLON);
    _BCL(RCC->CR, RCC_CR_HSEON);
    _BST(RCC->CR, RCC_CR_HSION);
    _WBS(RCC->CR, RCC_CR_HSIRDY);

    do {
        _BST(RCC->APB1ENR, RCC_APB1ENR_PWREN);
        reg_value = RCC->APB1ENR;
        (void)reg_value;
    } while (0x00);

    /* set flash latency 2WS */
    _BMD(FLASH->ACR, FLASH_ACR_LATENCY, FLASH_ACR_LATENCY_2WS);
    /* setting up PLL 16MHz HSI, VCO=144MHz, PLLP = 72MHz PLLQ = 48MHz  */
    _BMD(RCC->PLLCFGR,
        RCC_PLLCFGR_PLLM | RCC_PLLCFGR_PLLN | RCC_PLLCFGR_PLLSRC | RCC_PLLCFGR_PLLQ | RCC_PLLCFGR_PLLP,
        _VAL2FLD(RCC_PLLCFGR_PLLM, 8) | _VAL2FLD(RCC_PLLCFGR_PLLN, 72) | _VAL2FLD(RCC_PLLCFGR_PLLQ, 3));

    // Original driver also fails to set the APB1 Prescaler as the APB1 clock must run
    //  at or below 50MHz
    RCC->CFGR &= ~((uint32_t)(0x07 << 0x0A));
    RCC->CFGR |=  ((uint32_t)(0x04 << 0x0A));
    /* enabling PLL */
    _BST(RCC->CR, RCC_CR_PLLON);
    _WBS(RCC->CR, RCC_CR_PLLRDY);
    /* switching to PLL */
    _BMD(RCC->CFGR, RCC_CFGR_SW, RCC_CFGR_SW_PLL);
    _WVL(RCC->CFGR, RCC_CFGR_SWS, RCC_CFGR_SWS_PLL);

    // wait for the clock to stabilize
    reg_value = 0x0000FFFF;
    while (reg_value) { --reg_value; };
    LD3_Init();
    LD3_ON();
    // lots of other stuff...
} //

The rest of the code is too much to post, but this suffices to show the problem.  When stepping through the debugger, the LED turns on, but won't do so in Release mode nor run in Debug mode without the debugger stepping the code (and of course, the LED is just an obvious test output, none of the rest of our code runs either).  This issue usually relates to incorrect HW settings on the external pins but that is not the case here, and we have numerous examples of this same code to configure the clocks running in other cases perfectly without issue.  Any suggestions as to what to look at would be appreciated.