Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
i got really weird test result of memory barrier on Cortex A9
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
8 replies
Subscribers
126 subscribers
Views
3188 views
Users
0 members are here
Related
i got really weird test result of memory barrier on Cortex A9
Offline
yitian bu
over 7 years ago
Parents
Offline
yitian bu
over 7 years ago
Note: This was originally posted on 28th November 2011 at
http://forums.arm.com
@isogen74, thank you very much for your reply. i can understand most of your points, but for the last one, i can not make it clearly enough, can you please correct me?
1. int1 and int2 use the different cache line(suppose they are in index 0 and index 1 cache lines):
(1). int1 and int2 are both equal 10.
(2). int1 is in cache line index 0 of cpu1, and int2 is in cache line index 1 of cpu1.
(3). int1 is in cache line index 0 of cpu0, and int2 is in cache line index 1 of cpu0.
(4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
(5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
(6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
(7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 1 of cpu0 is invalidated.
(8). cpu1 update int1 to 12, and then update int2 to 12, after that , both cache line index 0 and index 1 are invalidated in cpu0.
(9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 1. at last it get the value 12 , and trigger the kenel panic.
2. int1 and int2 use the same cache line(suppose both of them are in index 0)
(1). int1 and int2 are both equal 10.
(2). int1 and int2 are in cache line index 0 of cpu1.
(3). int1 and int2 are in cache line index 0 of cpu0.
(4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
(5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
(6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
(7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 0 of cpu0 is invalidated.
(8). cpu1 update int1 to 12, and then update int2 to 12, these two actions will update the same cache line of cpu1, and will ask the cpu0 to invalidate the same cache line, but what is the difference between this step and the step 1.(8) ?
(9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 0. at last it get the value 12 , and trigger the kenel panic.
in both scenarios, the timing of the step 6 is very important for test result. my guess is that cpu1 is do writing, and because it is always operating with cache hit, so it is much fater than the reading acting, so maybe couples of writing actions can happen between the two reading actions.
and in scenario 1, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 1.
in scenario 2, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 0. will these two validating been merged to one action?
how to get the result that in scenario 2 the cpu0("read thread") are blocked less time than in scenario 1 ?
and does this blocking action caused by cache line invalidating instruction issued by cpu1 ?
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
yitian bu
over 7 years ago
Note: This was originally posted on 28th November 2011 at
http://forums.arm.com
@isogen74, thank you very much for your reply. i can understand most of your points, but for the last one, i can not make it clearly enough, can you please correct me?
1. int1 and int2 use the different cache line(suppose they are in index 0 and index 1 cache lines):
(1). int1 and int2 are both equal 10.
(2). int1 is in cache line index 0 of cpu1, and int2 is in cache line index 1 of cpu1.
(3). int1 is in cache line index 0 of cpu0, and int2 is in cache line index 1 of cpu0.
(4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
(5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
(6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
(7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 1 of cpu0 is invalidated.
(8). cpu1 update int1 to 12, and then update int2 to 12, after that , both cache line index 0 and index 1 are invalidated in cpu0.
(9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 1. at last it get the value 12 , and trigger the kenel panic.
2. int1 and int2 use the same cache line(suppose both of them are in index 0)
(1). int1 and int2 are both equal 10.
(2). int1 and int2 are in cache line index 0 of cpu1.
(3). int1 and int2 are in cache line index 0 of cpu0.
(4). suppose at this point, cpu1 write add 1 to int1, this will update cache line index 0 of cpu1, and ask cpu0 to invalidate the cache line index 0 of cpu0, after this step the int1 is 11, i think this step should done by hardware automatically, right?
(5). suppose cpu0 start to read int1, because cache line index 0 is invalidated in step 4, so cpu0 should get value from memory.at alst it gets the value 11.
(6). suppose cpu0 is busy doing other bus traffic, and delayed to get the next int2 value.
(7). cpu1 start to update int2, the same as step (4), after this step, int2 is 11, and the cache line index 0 of cpu0 is invalidated.
(8). cpu1 update int1 to 12, and then update int2 to 12, these two actions will update the same cache line of cpu1, and will ask the cpu0 to invalidate the same cache line, but what is the difference between this step and the step 1.(8) ?
(9). now cpu0 start to get value of int2, it try to get from memory, and fill cache line index 0. at last it get the value 12 , and trigger the kenel panic.
in both scenarios, the timing of the step 6 is very important for test result. my guess is that cpu1 is do writing, and because it is always operating with cache hit, so it is much fater than the reading acting, so maybe couples of writing actions can happen between the two reading actions.
and in scenario 1, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 1.
in scenario 2, when writing int1, cpu1 ask cpu0 to validate cache line 0, when writing int2, cpu1 ask cpu0 to validate cache line 0. will these two validating been merged to one action?
how to get the result that in scenario 2 the cpu0("read thread") are blocked less time than in scenario 1 ?
and does this blocking action caused by cache line invalidating instruction issued by cpu1 ?
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Answered
Extended asm alternative for Arm Compiler 5 (memory barriers)
+1
Memory Management Unit (MMU)
Arm Assembly Language (ASM)
Arm Compiler 5
2200
views
1
reply
Latest
1 month ago
by
Ronan Synnott
Answered
Use Arm DS5 streamline performance analyzer on TX2
+1
2989
views
9
replies
Latest
1 month ago
by
ShirB
Answered
Problem with arm_cmplx_mag_f32()
+1
2514
views
2
replies
Latest
2 months ago
by
Vishal_Patel
Answered
Can anyone please help me on how evalution development studio 2020.1 work s and which compiler is needed and how it can be setup?
+1
2246
views
3
replies
Latest
2 months ago
by
Ronan Synnott
Answered
Can anyone tell me the difference between DSTREAM and DSTREAM-ST?
+1
2375
views
2
replies
Latest
2 months ago
by
Xiang
<
>
View all questions in Arm Development Studio forum