Support forums

Architectures and Processors forum Why might Loop Unrolling contribute to Lower Runtime When using two Cores (X1 or A76) but not with one Core?

State Accepted Answer
Locked Locked
Replies 2 replies
Subscribers 350 subscribers
Views 1171 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why might Loop Unrolling contribute to Lower Runtime When using two Cores (X1 or A76) but not with one Core?

FabianSchuetze 9 months ago

I am benchmarking an FMA application running on one or two Cortex X1 cores or one or two A76 chips (inside a Pixel 6 phone). Loop unrolling improves performance by ~10%, but only when I use two chips and not just one.

Consider the following code:
```
    for (size_t x = 0; x < 4; x++) {
        size_t row = start_row + x * 8;
        for (size_t y = 0; y < 4; y++) {
            size_t col = start_col + y * 8;
            fma_f32_8x8(bptr + col, aptr + K * row, M, N, K,
                        cptr + col + N * row);
        }
    }
When I unroll the inner loop, the entire application runs about 10% faster, but only when the application runs on two cores. When I test the application on a single core (either X1 or A76), the runtime barely changes. Why might that be the case? At first, I suspected frontend stalls, but I could not see a relationship between frontend stalls in simpleperf (Android's perf wrapper) and whether or not I unrolled the loop. The branch-misses improve with loop unrolling, but that already improves when I use one core. Do the two cores share a ressource that might become saturated when I use two cores but don't unroll the loop?

Does anybody know why loop unrolling in the FMA application might be particularly beneficial when using two cores and how I could verify this?

Top replies

FabianSchuetze 9 months ago +3 verified

I did more benchmarking and realized that the effect I saw was a fluke and not statistically significant. This question could be deleted (removed).