topic Re: Reader-writer Lock in Intel® Moderncode for Parallel Architectures

Reader-writer Lock

matt_j — Sat, 12 Dec 2009 02:54:19 GMT

Hello,

I've written a simple reader-writer lock for my own needs and am just after another pair of eyes to identify any issues I may have missed.

The domain this will be used in involves the vast majority being reads with the workload between each read-request being quite significant (ie., there will be little thread-contention)

A few things to note that may or may not be obvious:
#1: A basic lock needs to be aquired for any request (read/write)
#2: Writers are given priority, the "Writers" variable is about pending-writers, not current writers.

[cpp]struct RWLock
{
	// Attributes
	Byte Writers;
	Byte Lock;
	Byte Readers;
	Byte Unused;

	// Methods
	Void FASTCALL AquireWrite();
	Void FASTCALL AquireRead();
	Void FASTCALL ReleaseWrite();
	Void FASTCALL ReleaseRead();
};


NAKED Void FASTCALL RWLock::AquireWrite()
{
	__asm
	{
		// Signal readers to wait.
		add byte ptr [ecx], 1

		// Aquire lock, no readers.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx+1], dx
			jnz short SPIN

		// Decrement writer-pending count.
		sub byte ptr [ecx], 1

		// Aquired.
		ret
	}
}




NAKED Void FASTCALL RWLock::AquireRead()
{
	__asm
	{
		// Aquire lock.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx], dx
			jnz short SPIN

		// Increment reader count.
		lock add byte ptr [ecx+2], 1

		// Free lock.
		mov byte ptr [ecx+1], 0

		// Aquired.
		ret
	}
}


FORCEINLINE Void FASTCALL RWLock::ReleaseRead()
{
	// Decrement reader count.
	this->Readers--;
}


FORCEINLINE Void FASTCALL RWLock::ReleaseWrite()
{
	// Free lock.
	this->Lock = 0;
}[/cpp]

For C-style ('this' omitted for clarity)

[cpp]/*
 * AquireWrite()
 */

// Signal readers to wait.
Writers++;

// Aquire lock when no writers or readers are present.
while(Lock != 0 && Readers != 0);
Lock=1;

// Decrement writer-pending count.
Writers--;


/*
 * AquireRead()
 */

// Aquire lock when no writers are either present or pending.
while(Writers != 0 && Lock != 0);
Lock=1;

// Increment reader count.
Readers++;

// Free lock
Lock=0;
[/cpp]

Cheers,

Matt.

Re: Reader-writer Lock

Dmitry_Vyukov — Mon, 14 Dec 2009 20:33:09 GMT

Quoting - matt.j

[cpp]NAKED Void FASTCALL RWLock::AquireRead()
{
	__asm
	{
		// Aquire lock.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx], dx

[/cpp]

Doesn't it have to be lock cmpxchg word ptr [ecx + 1], dx?

Re: Reader-writer Lock

Dmitry_Vyukov — Mon, 14 Dec 2009 20:35:38 GMT

You must use the same address and the same operand size in all atomic RMW operations.

Intel 64 and IA-32 Architectures Software Developer's Manual Volume
3A: System Programming Guide, Part 1

7.1.2.2 Software Controlled Bus Locking

Software should access semaphores (shared memory used for signalling
between multiple processors) using identical addresses *and operand lengths*.
For example, if one processor accesses a semaphore using a word access, other
processors should not access the semaphore using a byte access.

Re: Reader-writer Lock

Dmitry_Vyukov — Mon, 14 Dec 2009 20:39:54 GMT

You better align your mutex variable on 4-byte boundary. If it happens to cross cache line boundary it will run very slowly on Intel QPI based system, system-wide quiescence is *very* unpleasant thing.

Re: Reader-writer Lock

Dmitry_Vyukov — Mon, 14 Dec 2009 20:43:07 GMT

Quoting - matt.j

[cpp]NAKED Void FASTCALL RWLock::AquireWrite()
{
	__asm
	{
		// Signal readers to wait.
		add byte ptr [ecx], 1

		// Aquire lock, no readers.
		mov edx, 1

		ALIGN 16
		SPIN:
			xor eax, eax
			pause
			lock cmpxchg word ptr [ecx+1], dx
			jnz short SPIN

		// Decrement writer-pending count.
		sub byte ptr [ecx], 1

		// Aquired.
		ret
	}
}
[/cpp]

Here is a race on [ecx], you use non-atomic operations 'add byte ptr [ecx]' and 'sub byte ptr [ecx]'. Two concurrent writer can damage [ecx], so that it will never become zero again.

Re: Reader-writer Lock

Dmitry_Vyukov — Mon, 14 Dec 2009 20:44:50 GMT

Quoting - matt.j

[cpp]
FORCEINLINE Void FASTCALL RWLock::ReleaseRead()
{
	// Decrement reader count.
	this->Readers--;
}
[/cpp]

ReleaseReed() is not atomic wrt 'Readers', you need to use 'lock sub' in ReleaseRead(). Otherwise concurrent readers will corrupt 'Readers'.

Re: Reader-writer Lock

Chris_M__Thomasson — Wed, 16 Dec 2009 00:39:31 GMT

Quoting - matt.j

Hello,

I've written a simple reader-writer lock for my own needs and am just after another pair of eyes to identify any issues I may have missed.

[...]

Read this whole thread:

http://software.intel.com/en-us/forums/showthread.php?t=65822

Also, here is the best FIFO rw-spinlock out there:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/be3871ad661efa73/ac0c61ee539625de

I modified it to block, however you can easily change it back to a spinlock if you simply remove the eventcount logic and add a backoff/yield in the wait loops.

Any thoughts?