the issue of different execution times for the same assembly code on Cortex-A53

Cortex-A53 :The execution time of the same four assembly code instructions varies at different locations(non-cacheable). After counting with the PMU, it is found that in the slower cases, there is an additional instruction per cycle.If the instruction cache is enabled, the runtime will be consistent. Please analyze the reasons for this

Parents
  • Is your sequence something like:

    start:
    Read PMU Cycle Count
    INSTR1
    INSTR2
    INSTR3
    INSTR4
    Read PMU Cycle Count
    B start

    If so, I suspect part of the problem is that the sequence (4 assembly instructions) is too short to accurately measure this way.  The issue is that with such a short sequence the noise introduced by reading the PMU, branching, prefetch, etc... will be comparatively large relative to what you're trying to measure.  Hence the noisy result.

    More typically I'd expect something like this:

      MOV  x0, num_loops
    Read PMU Cycle Count
    start:
    INSTR1
    INSTR2
    INSTR3
    INSTR4
    SUB x0, x0, 1
    CBNZ start
    Read PMU Cycle Count

    That is, run the sequence many times and average the result.

Reply
  • Is your sequence something like:

    start:
    Read PMU Cycle Count
    INSTR1
    INSTR2
    INSTR3
    INSTR4
    Read PMU Cycle Count
    B start

    If so, I suspect part of the problem is that the sequence (4 assembly instructions) is too short to accurately measure this way.  The issue is that with such a short sequence the noise introduced by reading the PMU, branching, prefetch, etc... will be comparatively large relative to what you're trying to measure.  Hence the noisy result.

    More typically I'd expect something like this:

      MOV  x0, num_loops
    Read PMU Cycle Count
    start:
    INSTR1
    INSTR2
    INSTR3
    INSTR4
    SUB x0, x0, 1
    CBNZ start
    Read PMU Cycle Count

    That is, run the sequence many times and average the result.

Children