Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1695 Discussions

## My new invention: Scalable distributed sequential lock

Novice
550 Views

Hello,

Scalable Distributed Sequential lock version 1.01

Author: Amine Moulay Ramdane.

Email: aminer@videotron.ca

Description:

Please look at the "test.pas" example that i have included and you will notice that the reader side have to be used in a transactional mode with a repeat-until loop, like this:

repeat

t.rlock(id1,id2);

until t.runlock(id1,id2);

it is like a transactional mode of a Seqlock and id1 and id2 must be of type "long" that must be imported from the drwlock library. The writer

side is easy to program , please look at the "test.pas" example and you will understand how to use my scalable distributed sequential lock.

My scalable distributed sequential lock can be used from accross processes and threads.

My new invention called distributed sequential lock beats

the following algorithm, it is more scalable and faster than

the following algorithm of Dmitry Vyukov called Distributed

reader-writer mutex ( i have just explained to you why ..):

And my new invention called distributed \ sequential lock beats also Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wiki/Seqlock

And my new invention called distributed sequential lock can replace RCU cause it is a powerful synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential lock that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and by looking first at the weakness of the following algorithm of Dmitry Vyukov's Reader-writer mutex:

look at the writer side of the Dmitry Vykov's algorithm , it is doing with  every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

but this render this algorithm inefficient for the writer side , cause

every pthread_rwlock_wrlock() call transfer many cache-lines betwenn cores, so this will generate too much cache line transfers between cores when each writer executes distr_rw_mutex_wrlock() and this will become worse and worse when you add more and more cores, i mean this will tranfer more and more cache-lines between cores with more and more cores... so this will render the writer side throughput slow and this is the weakness of this algorithm...

So this is why i have used a sequential lock mechanism in the writer

side of my new algorithm so that to minimize efficiently the cache-lines transfers, so if you look at my algorithm, i am doing this inside the WLock() function on the writer side:

if (FCount6^.fcount6 mod GetSystemThreadCount)=0

then FCount5^.fcount5:=1;

FCount6^.fcount6 is a counter and i have applied to it a modulo of GetSystemThreadCount(), and GetSystemThreadCount() will return

the number of cores, so every "number of cores" calls to WLock() ,

my new algorithm switches from a sequential lock to  a distributed lock so that to eliminate the "livelock" and "starvation" weaknesses of Seqlock, cause Seqlock is a lockfree mechanism that can starve threads,

and that can livelock when there is more writers , cause Seqlock gives preference to writers threads over reader threads, but my new invention of a scalable distributed sequential lock  eliminates "livelock" and "starvation" of Seqlock when there is more writers cause it switches to a distributed lock on every "number of cores" calls to the WLock() method , so this is why my new invention beats Seqlock,

and my new invention beats also the Dmitry's Reader-writer mutex cause it switches to a distributed lock on every number of cores

calls to WLock(), so in the remaining calls it switches to a sequential lock that render the writer side faster cause it generate much

less cache-lines tranfers than the Dmitry Vyukov's distributed reader-writer mutex. On the reader side of my algorithm i am using a Sequential lock mechanism. And i think that since my new algorithm of a scalable distributed sequential lock is very powerful, so i think it can even replace RCU.

```About the sequential consistency of  my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and
this "lock add" assembler instruction is a barrier for stores and loads on x86, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores are not
reordered  with stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, since i am using a "lock add" assembler instruction inside the call to the RLock() of the
distributed reader-writer lock , so it acts as a barrier and loads and stores inside the reader section are not reordered with this barrier on x86, so RLock() is sequential consistent and correct, and for RUnlock()
since i am calling RUnlock() of the distributed reader-writer mutex , this RUnlock() is calling "lock add" in assembler, so it acts as a barrier so the the loads and stores that we are using after this barrier
are not reordrered with this barrier, so all in all my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock.

Disclaimer:

My software is provided on an "as-is" basis, with no warranties,
express or implied.  The entire risk and liability of using it is yours.
Any damages resulting from the use or misuse of this software will be
the responsibility of the user.

You can download my scalable distributed sequential lock version 1.01 from:

```

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches:  -\$H+ -DDelphi

{\$DEFINE CPU32} and {\$DEFINE Windows32} for 32 bit systems

{\$DEFINE CPU64} and {\$DEFINE Windows64} for 64 bit systems

Thank you,
Amine Moulay Ramdane.

2 Replies
Novice
550 Views

Hello,

You can download my scalable distributed sequential lock version 1.01 from:

Thank you,

Amine Moulay Ramdane.

Novice
550 Views

Hello,

Please read again i have formatted correctly my post...

Scalable Distributed Sequential lock version 1.01

Author: Amine Moulay Ramdane.

Email: aminer@videotron.ca

Description:

Please look at the "test.pas" example that i have included and you will notice that the reader side have to be used in a transactional mode with a repeat-until loop, like this:

repeat

t.rlock(id1,id2);

until t.runlock(id1,id2);

it is like a transactional mode of a Seqlock and id1 and id2 must be of type "long" that must be imported from the drwlock library. The writer side is easy to program , please look at the "test.pas" example and you will understand how to use my scalable distributed sequential lock.

My scalable distributed sequential lock can be used from accross processes and threads.

My new invention called distributed sequential lock beats the following algorithm, it is more scalable and faster than the following algorithm of Dmitry Vyukov called Distributed reader-writer mutex ( i have just explained to you why ..):

And my new invention called distributed \ sequential lock beats also Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wiki/Seqlock

And my new invention called distributed sequential lock can replace RCU cause it is a powerful synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential lock that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and by looking first at the weakness of the following algorithm of Dmitry Vyukov's Reader-writer mutex:

look at the writer side of the Dmitry Vykov's algorithm , it is doing with  every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

but this render this algorithm inefficient for the writer side , cause

every pthread_rwlock_wrlock() call transfer many cache-lines betwenn cores, so this will generate too much cache line transfers between cores when each writer executes distr_rw_mutex_wrlock() and this will become worse and worse when you add more and more cores, i mean this will tranfer more and more cache-lines between cores with more and more cores... so this will render the writer side throughput slow and this is the weakness of this algorithm...

So this is why i have used a sequential lock mechanism in the writer side of my new algorithm so that to minimize efficiently the cache-lines transfers, so if you look at my algorithm, i am doing this inside the WLock() function on the writer side:

if (FCount6^.fcount6 mod GetSystemThreadCount) = 0  then FCount5^.fcount5:=1;

FCount6^.fcount6 is a counter and i have applied to it a modulo of GetSystemThreadCount(), and GetSystemThreadCount() will returnthe number of cores, so every "number of cores" calls to WLock() , my new algorithm switches from a sequential lock to  a distributed lock so that to eliminate the "livelock" and "starvation" weaknesses of Seqlock, cause Seqlock is a lockfree mechanism that can starve threads and that can livelock when there is more writers , cause Seqlock gives preference to writers threads over reader threads, but my new invention of a scalable distributed sequential lock  eliminates "livelock" and "starvation" of Seqlock when there is more writers cause it switches to a distributed lock on every "number of cores" calls to the WLock() method , so this is why my new invention beats Seqlock, and my new invention beats also the Dmitry's Reader-writer mutex cause it switches to a distributed lock on every number of cores calls to WLock(), so in the remaining calls it switches to a sequential lock that render the writer side faster cause it generate much less cache-lines tranfers than the Dmitry Vyukov's distributed reader-writer mutex. On the reader side of my algorithm i am using a Sequential lock mechanism. And i think that since my new algorithm of a scalable distributed sequential lock is very powerful, so i think it can even replace RCU.

```About the sequential consistency of  my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and
this "lock add" assembler instruction is a barrier for stores and loads on x86, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores are not
reordered  with stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, since i am using a "lock add" assembler instruction inside the call to the RLock() of the
distributed reader-writer lock , so it acts as a barrier and loads and stores inside the reader section are not reordered with this barrier on x86, so RLock() is sequential consistent and correct, and for RUnlock()
since i am calling RUnlock() of the distributed reader-writer mutex , this RUnlock() is calling "lock add" in assembler, so it acts as a barrier so the the loads and stores that we are using after this barrier
are not reordrered with this barrier, so all in all my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock.

Disclaimer:

My software is provided on an "as-is" basis, with no warranties,
express or implied.  The entire risk and liability of using it is yours.
Any damages resulting from the use or misuse of this software will be
the responsibility of the user.

You can download my scalable distributed sequential lock version 1.01 from:

```

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches:  -\$H+ -DDelphi

{\$DEFINE CPU32} and {\$DEFINE Windows32} for 32 bit systems

{\$DEFINE CPU64} and {\$DEFINE Windows64} for 64 bit systems

Thank you,

Amine Moulay Ramdane.