This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What works as a data memory barrier?

We have a section of code that increments a variable shared among several threads. The code section is protected by a ldxr/stxr/dmb spin lock, and there is another dmb after the shared variable is updated. This code sequence is the body of a function that returns the shared variable's value before the update. The expectation is that no two calls to the function will return the same value. 

	.text	
 ## lap for synchtest
 ## 8 threads call this 100 times each, concurrently.
 ## x1 -> quadword used as a spin lock
 ## x2 -> quadword supposed to source a series of distinct
 ##       values.
 ## 
 .align 4
	.global	synchtest
	.type	synchtest, %function
synchtest:
	movz	x20,6
	mov	x12,sp
	str	x30,[sp,8]
	sub	sp,sp,144
	str	x12,[sp,0]
	str	x20,[sp,16]
 ##                            Enter ctitical section control
 ##                            (loop until [x0] goes 0->1)
L15182:
	movz	x27,1          
	add	x12,x0,0
L15185:
	ldxr	x11,[x12,0]
	cmp xzr,x11
	bne	L15186
	stxr	w10,x27,[x12,0]
	dmb 11	
	cbnz	w10,L15185
L15186:                        
	bne	L15182
 ##                            This is the critical section
 ##                            increment shared global [x1]
	ldr	x27,[x1,0]     
	sub	x23,x27,-8
	str	x23,[x1,0]
 ##	                       Now leave the critical section
	add	x12,x0,0
        str     xzr,[x12,0]
	dmb 11	
 ##			       and return the original value from
 ##			       shared global [x1]
	str	x27,[sp,136]
	ldr	x0,[sp,136]
	movz	x9,8
	ldr	x30,[sp,152]
	ldr	x20,[sp,160]
	str	x24,[sp,32]
	add	sp,sp,144
	ret	
 .size synchtest,.-synchtest

We start 8 threads running concurrently, calling the function 100 times each and record the results separately for each thread. Then we compare the sequences of values each thread received and check for duplications, where two threads got the same value. [EDIT: added: ] Our expectation is that there will be no duplicates. We find this is not the case.

Btw, we're on ARMv8 with CentOS 7.x (x = latest).

Parents
  • I suspect the problem lies where you attempt to leave the critical section.

    You perform the "ldr x27 / str x23" to read/write the global, then perform "str wzr" to release the lock, however, there is no ordering requirement between these two operations, i.e. another thread could observe the lock as free before the global is visible as being updated. You likely require a barrier between the "str x23" and the "str wzr".

    hth

    Simon.

Reply
  • I suspect the problem lies where you attempt to leave the critical section.

    You perform the "ldr x27 / str x23" to read/write the global, then perform "str wzr" to release the lock, however, there is no ordering requirement between these two operations, i.e. another thread could observe the lock as free before the global is visible as being updated. You likely require a barrier between the "str x23" and the "str wzr".

    hth

    Simon.

Children