Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Reader-writer Lock

matt_j
Beginner
841 Views
Hello,


I've written a simple reader-writer lock for my own needs and am just after another pair of eyes to identify any issues I may have missed.

The domain this will be used in involves the vast majority being reads with the workload between each read-request being quite significant (ie., there will be little thread-contention)

A few things to note that may or may not be obvious:
#1: A basic lock needs to be aquired for any request (read/write)
#2: Writers are given priority, the "Writers" variable is about pending-writers, not current writers.


[cpp]struct RWLock
{
	// Attributes
	Byte Writers;
	Byte Lock;
	Byte Readers;
	Byte Unused;

	// Methods
	Void FASTCALL AquireWrite();
	Void FASTCALL AquireRead();
	Void FASTCALL ReleaseWrite();
	Void FASTCALL ReleaseRead();
};


NAKED Void FASTCALL RWLock::AquireWrite()
{
	__asm
	{
		// Signal readers to wait.
		add byte ptr [ecx], 1

		// Aquire lock, no readers.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx+1], dx
			jnz short SPIN

		// Decrement writer-pending count.
		sub byte ptr [ecx], 1

		// Aquired.
		ret
	}
}




NAKED Void FASTCALL RWLock::AquireRead()
{
	__asm
	{
		// Aquire lock.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx], dx
			jnz short SPIN

		// Increment reader count.
		lock add byte ptr [ecx+2], 1

		// Free lock.
		mov byte ptr [ecx+1], 0

		// Aquired.
		ret
	}
}


FORCEINLINE Void FASTCALL RWLock::ReleaseRead()
{
	// Decrement reader count.
	this->Readers--;
}


FORCEINLINE Void FASTCALL RWLock::ReleaseWrite()
{
	// Free lock.
	this->Lock = 0;
}[/cpp]

For C-style ('this' omitted for clarity)

[cpp]/*
 * AquireWrite()
 */

// Signal readers to wait.
Writers++;

// Aquire lock when no writers or readers are present.
while(Lock != 0 && Readers != 0);
Lock=1;

// Decrement writer-pending count.
Writers--;


/*
 * AquireRead()
 */

// Aquire lock when no writers are either present or pending.
while(Writers != 0 && Lock != 0);
Lock=1;

// Increment reader count.
Readers++;

// Free lock
Lock=0;
[/cpp]




Cheers,


Matt.
0 Kudos
6 Replies
Dmitry_Vyukov
Valued Contributor I
841 Views
Quoting - matt.j

[cpp]NAKED Void FASTCALL RWLock::AquireRead()
{
__asm
{
// Aquire lock.
mov edx, 1

ALIGN 16
SPIN:
xor eax, eax
pause
lock cmpxchg word ptr [ecx], dx

[/cpp]


Doesn't it have to be lock cmpxchg word ptr [ecx + 1], dx?

0 Kudos
Dmitry_Vyukov
Valued Contributor I
841 Views
You must use the same address and the same operand size in all atomic RMW operations.
Intel 64 and IA-32 Architectures Software Developer's Manual Volume
3A: System Programming Guide, Part 1

7.1.2.2 Software Controlled Bus Locking

Software should access semaphores (shared memory used for signalling
between multiple processors) using identical addresses *and operand lengths*.
For example, if one processor accesses a semaphore using a word access, other
processors should not access the semaphore using a byte access.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
841 Views
You better align your mutex variable on 4-byte boundary. If it happens to cross cache line boundary it will run very slowly on Intel QPI based system, system-wide quiescence is *very* unpleasant thing.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
841 Views
Quoting - matt.j

[cpp]NAKED Void FASTCALL RWLock::AquireWrite()
{
__asm
{
// Signal readers to wait.
add byte ptr [ecx], 1

// Aquire lock, no readers.
mov edx, 1

ALIGN 16
SPIN:
xor eax, eax
pause
lock cmpxchg word ptr [ecx+1], dx
jnz short SPIN

// Decrement writer-pending count.
sub byte ptr [ecx], 1

// Aquired.
ret
}
}
[/cpp]


Here is a race on [ecx], you use non-atomic operations 'add byte ptr [ecx]' and 'sub byte ptr [ecx]'. Two concurrent writer can damage [ecx], so that it will never become zero again.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
841 Views
Quoting - matt.j
[cpp]
FORCEINLINE Void FASTCALL RWLock::ReleaseRead()
{
// Decrement reader count.
this->Readers--;
}
[/cpp]



ReleaseReed() is not atomic wrt 'Readers', you need to use 'lock sub' in ReleaseRead(). Otherwise concurrent readers will corrupt 'Readers'.
0 Kudos
Chris_M__Thomasson
New Contributor I
841 Views
Quoting - matt.j
Hello,


I've written a simple reader-writer lock for my own needs and am just after another pair of eyes to identify any issues I may have missed.

[...]


Read this whole thread:




Also, here is the best FIFO rw-spinlock out there:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/be3871ad661efa73/ac0c61ee539625de


I modified it to block, however you can easily change it back to a spinlock if you simply remove the eventcount logic and add a backoff/yield in the wait loops.



Any thoughts?
0 Kudos
Reply