This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

i got really weird test result of memory barrier on Cortex A9

Parents
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com

    @isogen74, thank you very much for your reply. i can understand most of your points, but for the last one, i can not make it clearly enough, can you please correct me?

    1. int1 and int2 use the different cache line(suppose they are in index 0 and index 1 cache lines):
    (1). int1 and int2 are both equal 10.
    (2). int1 is in cache line index 0 of cpu1, and int2 is in cache line index 1 of cpu1.
    (3). int1 is in cache line index 0 of cpu0, and int2 is in cache line index 1 of cpu0.
    (4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
    (5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
    (6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
    (7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 1 of cpu0 is invalidated.
    (8). cpu1 update int1 to 12, and then update int2 to 12, after that , both cache line index 0 and index 1 are invalidated in cpu0.
    (9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 1. at last it get the value 12 , and trigger the kenel panic.

    2. int1 and int2 use the same cache line(suppose both of them are in index 0)
    (1). int1 and int2 are both equal 10.
    (2). int1 and int2 are in cache line index 0 of cpu1.
    (3). int1 and int2 are in cache line index 0 of cpu0.
    (4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
    (5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
    (6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
    (7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 0 of cpu0 is invalidated.
    (8). cpu1 update int1 to 12, and then update int2 to 12, these two actions will update the same cache line of cpu1, and will ask the cpu0 to invalidate the same cache line, but what is the difference between this step and the step 1.(8) ? 
    (9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 0. at last it get the value 12 , and trigger the kenel panic.

    in both scenarios, the timing of  the step 6 is very important for test result. my guess is that cpu1 is do writing, and because it is always operating with cache hit, so it is much fater than the reading acting, so maybe couples of writing actions can happen between the two reading actions.

    and in scenario 1, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 1.

    in scenario 2, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 0. will these two validating been merged to one action?

    how to get the result that in scenario 2 the cpu0("read thread") are blocked less time than in scenario 1 ?

    and does this blocking action caused by cache line invalidating instruction issued by cpu1 ?
Reply
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com

    @isogen74, thank you very much for your reply. i can understand most of your points, but for the last one, i can not make it clearly enough, can you please correct me?

    1. int1 and int2 use the different cache line(suppose they are in index 0 and index 1 cache lines):
    (1). int1 and int2 are both equal 10.
    (2). int1 is in cache line index 0 of cpu1, and int2 is in cache line index 1 of cpu1.
    (3). int1 is in cache line index 0 of cpu0, and int2 is in cache line index 1 of cpu0.
    (4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
    (5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
    (6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
    (7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 1 of cpu0 is invalidated.
    (8). cpu1 update int1 to 12, and then update int2 to 12, after that , both cache line index 0 and index 1 are invalidated in cpu0.
    (9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 1. at last it get the value 12 , and trigger the kenel panic.

    2. int1 and int2 use the same cache line(suppose both of them are in index 0)
    (1). int1 and int2 are both equal 10.
    (2). int1 and int2 are in cache line index 0 of cpu1.
    (3). int1 and int2 are in cache line index 0 of cpu0.
    (4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
    (5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
    (6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
    (7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 0 of cpu0 is invalidated.
    (8). cpu1 update int1 to 12, and then update int2 to 12, these two actions will update the same cache line of cpu1, and will ask the cpu0 to invalidate the same cache line, but what is the difference between this step and the step 1.(8) ? 
    (9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 0. at last it get the value 12 , and trigger the kenel panic.

    in both scenarios, the timing of  the step 6 is very important for test result. my guess is that cpu1 is do writing, and because it is always operating with cache hit, so it is much fater than the reading acting, so maybe couples of writing actions can happen between the two reading actions.

    and in scenario 1, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 1.

    in scenario 2, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 0. will these two validating been merged to one action?

    how to get the result that in scenario 2 the cpu0("read thread") are blocked less time than in scenario 1 ?

    and does this blocking action caused by cache line invalidating instruction issued by cpu1 ?
Children
No data