FVP + boot-wrapper-aarch64 Multi-Core Boot Failure

Description:

I'm using FVP + boot-wrapper-aarch64 to boot a kernel. When running with a single core, the boot process is successful. However, when enabling multiple cores, the boot fails.

Observations:

  • In boot-wrapper-aarch64/common/init.c, it appears that only the primary CPU (CPU 0) executes the following function:

void cpu_init_bootwrapper(void)
{
    static volatile unsigned int cpu_next = 0;
    unsigned int cpu = this_cpu_logical_id();

    if (cpu == 0)
        init_bootwrapper();

    while (cpu_next != cpu)
        wfe();

    cpu_init_self(cpu);

    cpu_next = cpu + 1;
    dsb(sy);
    sev();

    if (cpu != 0)
        return;

    while (cpu_next != NR_CPUS)
        wfe();

    print_string("All CPUs initialized. Entering kernel...\r\n\r\n");
}

  • Possible causes:

    1. SEV() issue: Could sev() fail to activate secondary CPUs?
    2. CPU power-on issue: Are secondary CPUs actually powered on?

Additional Information:

  • I only use boot-wrapper-aarch64, without TF-A.
  • Could this be an FVP issue? Does FVP handle SMC PSCI requests properly for boot-wrapper-aarch64?
  • I suspect that boot-wrapper-aarch64 itself plays the role of TF-A, so it should handle PSCI requests.

Further questions:

  1. How can I trace the CPU states in FVP Base Model?
  2. How can I determine whether each CPU is in EL1 or EL3?
  3. How can I monitor whether each CPU successfully powers on/off via PSCI requests?

Any insights or debugging methods would be appreciated 

  • Disclaimer, not a Linux expert and I've not used boot-wrapper-aach64.

    Could this be an FVP issue? Does FVP handle SMC PSCI requests properly for boot-wrapper-aarch64?

    Not sure what you mean.  The FVP is model of a hardware platform, PSCI is a software interface.  You'll need to have some software on the model at EL3, such as TF-A, to implement the PSCI interface.

    How can I trace the CPU states in FVP Base Model?

    Is this the Base Rev C FVP?  If so, it does support tracing of execution.  The one I'm most familiar with is the TARMAC plugin (./Base_RevC_AEMvA_pkg/plugins/<platform>/TarmacTrace.so).  When you launch the model add "--plugin ./Base_RevC_AEMvA_pkg/plugins/<platform>/TarmacTrace.so" (replacing <platform> as appropriate).  You'll then get an instruction level trace from each of the cores.

    Note: TARMAC generates a _lot_ of output.  

    How can I determine whether each CPU is in EL1 or EL3?

    Software can determine which EL it is running at by reading CurrentEL - https://developer.arm.com/documentation/ddi0601/2024-12/AArch64-Registers/CurrentEL--Current-Exception-Level?lang=en

    If you mean externally while debugging, the TARMAC trace includes the Exception level/Execution state/Security state.

    Or, you could connect a compatible debugger.

    SEV() issue: Could sev() fail to activate secondary CPUs?

    I doubt it is the SEV failing either to send or wake a core from WFE.  But if you wanted to test the theory, replace the WFE with a NOP.  That'll cause the secondary cores to continuously spin until released.  Wasteful/inefficient, but it'll remove the possibility of the SEV/WFE being the problem.

  • Thanks for your reply. You are correct: After reviewing the boot-wrapper-aarch64 code, I found that it indeed implements the PSCI method and responds to PSCI requests, this is not related to FVP.

    Additionally, TarmacTrace.so does output the CPU status, including the EL status.

    Regarding the boot failure, I discovered that boot-wrapper-aarch64 only supports booting with all cores reset at system startup. To resolve this, I should use the following FVP parameter:

    -C pctl.startup=0.0.0.*

    Previously, this parameter was set to 0.0.0.0. At that time, I used AFT to boot, which supports two boot modes: one where all cores are reset at system startup and another where only the primary CPU is reset.