My new invention: Scalable distributed sequential lock

aminer10 · ‎12-04-2014

Hello,

Scalable Distributed Sequential lock version 1.01

Author: Amine Moulay Ramdane.

Description:

This scalable distributed sequential lock was invented by Amine Moulay Ramdane, and it combines the characteristics of a distributed reader-writer lock with the characteristics of a sequential lock , so it is a clever hybrid reader-writer lock that is more powerful than the the Dmitry Vyukov's distributed reader-writer mutex , cause the Dmitry Vyukov's distributed reader-writer lock will become slower and slower on the writer side with more and more cores because it transfers too many cache-lines between cores on the writer side, so my invention that is my scalable distributed sequential lock has eliminated this weakness of the Dmitry Vyukov's distributed reader-writer mutex, so that the writers throughput has become faster and very fast, and my scalable distributed sequential lock elminates the weaknesses of the Seqlock (sequential lock) that is "livelock" and "starvation" even when there is more writers, so my scalable distributed sequential lock is a powerful sychronization mechanism that can replace RCU and that can replace Seqlock and that can replace Dmitry Vyukov's distributed reader-writer mutex. And if you look at my algorithm you will notice that on the reader side it looks like a sequential lock, but i have added a variable called fcount5^.fcount5 that allows the readers to switch between a sequential lock mechanism and a distributed reader-writer mechanism, so on the writer side you will notice that on every "number of cores" calls to WLock(), my scalable distributed sequential lock switches to a distributed reader-writer lock mechanism and executes the writer in this mode for a one time and for the rest of the time it will switch to a sequential lock mode , this allows my scalable distributed sequential lock to minimize efficiently cache-line transfers on the writer side, this is why my scalable distributed sequential lock beats Dmitry Vyukov distributed reader-writer mutex and it beats Seqlock.

Please look at the "test.pas" example that i have included and you will notice that the reader side have to be used in a transactional mode with a repeat-until loop, like this:

repeat

t.rlock(id1,id2);

until t.runlock(id1,id2);

it is like a transactional mode of a Seqlock and id1 and id2 must be of type "long" that must be imported from the drwlock library. The writer

side is easy to program , please look at the "test.pas" example and you will understand how to use my scalable distributed sequential lock.

My scalable distributed sequential lock can be used from accross processes and threads.

My new invention called distributed sequential lock beats

the following algorithm, it is more scalable and faster than

the following algorithm of Dmitry Vyukov called Distributed

reader-writer mutex ( i have just explained to you why ..):

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

And my new invention called distributed \ sequential lock beats also Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wiki/Seqlock

And my new invention called distributed sequential lock can replace RCU cause it is a powerful synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential lock that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and by looking first at the weakness of the following algorithm of Dmitry Vyukov's Reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

look at the writer side of the Dmitry Vykov's algorithm , it is doing with every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

pthread_rwlock_wrlock(&mtx-cell.mtx);

but this render this algorithm inefficient for the writer side , cause

every pthread_rwlock_wrlock() call transfer many cache-lines betwenn cores, so this will generate too much cache line transfers between cores when each writer executes distr_rw_mutex_wrlock() and this will become worse and worse when you add more and more cores, i mean this will tranfer more and more cache-lines between cores with more and more cores... so this will render the writer side throughput slow and this is the weakness of this algorithm...

So this is why i have used a sequential lock mechanism in the writer

side of my new algorithm so that to minimize efficiently the cache-lines transfers, so if you look at my algorithm, i am doing this inside the WLock() function on the writer side:

if (FCount6^.fcount6 mod GetSystemThreadCount)=0

then FCount5^.fcount5:=1;

FCount6^.fcount6 is a counter and i have applied to it a modulo of GetSystemThreadCount(), and GetSystemThreadCount() will return

the number of cores, so every "number of cores" calls to WLock() ,

my new algorithm switches from a sequential lock to a distributed lock so that to eliminate the "livelock" and "starvation" weaknesses of Seqlock, cause Seqlock is a lockfree mechanism that can starve threads,

and that can livelock when there is more writers , cause Seqlock gives preference to writers threads over reader threads, but my new invention of a scalable distributed sequential lock eliminates "livelock" and "starvation" of Seqlock when there is more writers cause it switches to a distributed lock on every "number of cores" calls to the WLock() method , so this is why my new invention beats Seqlock,

and my new invention beats also the Dmitry's Reader-writer mutex cause it switches to a distributed lock on every number of cores

calls to WLock(), so in the remaining calls it switches to a sequential lock that render the writer side faster cause it generate much

less cache-lines tranfers than the Dmitry Vyukov's distributed reader-writer mutex. On the reader side of my algorithm i am using a Sequential lock mechanism. And i think that since my new algorithm of a scalable distributed sequential lock is very powerful, so i think it can even replace RCU.

About the sequential consistency of  my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and 
this "lock add" assembler instruction is a barrier for stores and loads on x86, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores are not 
reordered  with stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, since i am using a "lock add" assembler instruction inside the call to the RLock() of the 
distributed reader-writer lock , so it acts as a barrier and loads and stores inside the reader section are not reordered with this barrier on x86, so RLock() is sequential consistent and correct, and for RUnlock() 
since i am calling RUnlock() of the distributed reader-writer mutex , this RUnlock() is calling "lock add" in assembler, so it acts as a barrier so the the loads and stores that we are using after this barrier 
are not reordrered with this barrier, so all in all my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock.

Disclaimer:

My software is provided on an "as-is" basis, with no warranties,
express or implied.  The entire risk and liability of using it is yours. 
Any damages resulting from the use or misuse of this software will be 
the responsibility of the user.


You can download my scalable distributed sequential lock version 1.01 from:

https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Thank you,

Amine Moulay Ramdane.

aminer10 · ‎12-04-2014

Hello,

You can download my scalable distributed sequential lock version 1.01 from:

https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock

Thank you,

Amine Moulay Ramdane.

aminer10 · ‎12-04-2014

Hello,

Please read again i have formatted correctly my post...

Scalable Distributed Sequential lock version 1.01

Author: Amine Moulay Ramdane.

Email: [email protected]

Description:

This scalable distributed sequential lock was invented by Amine Moulay Ramdane, and it combines the characteristics of a distributed reader-writer lock with the characteristics of a sequential lock , so it is a clever hybrid reader-writer lock that is more powerful than the the Dmitry Vyukov's distributed reader-writer mutex , cause the Dmitry Vyukov's distributed reader-writer lock will become slower and slower on the writer side with more and more cores because it transfers too many cache-lines between cores on the writer side, so my invention that is my scalable distributed sequential lock has eliminated this weakness of the Dmitry Vyukov's distributed reader-writer mutex, so that the writers throughput has become faster and very fast, and my scalable distributed sequential lock elminates the weaknesses of the Seqlock (sequential lock) that is "livelock" and "starvation" even when there is more writers, so my scalable distributed sequential lock is a powerful sychronization mechanism that can replace RCU and that can replace Seqlock and that can replace Dmitry Vyukov's distributed reader-writer mutex. And if you look at my algorithm you will notice that on the reader side it looks like a sequential lock, but i have added a variable called fcount5^.fcount5 that allows the readers to switch between a sequential lock mechanism and a distributed reader-writer mechanism, so on the writer side you will notice that on every "number of cores" calls to WLock(), my scalable distributed sequential lock switches to a distributed reader-writer lock mechanism and executes the writer in this mode for a one time and for the rest of the time it will switch to a sequential lock mode , this allows my scalable distributed sequential lock to minimize efficiently cache-line transfers on the writer side, this is why my scalable distributed sequential lock beats Dmitry Vyukov distributed reader-writer mutex and it beats Seqlock.

Please look at the "test.pas" example that i have included and you will notice that the reader side have to be used in a transactional mode with a repeat-until loop, like this:

repeat

t.rlock(id1,id2);

until t.runlock(id1,id2);

it is like a transactional mode of a Seqlock and id1 and id2 must be of type "long" that must be imported from the drwlock library. The writer side is easy to program , please look at the "test.pas" example and you will understand how to use my scalable distributed sequential lock.

My scalable distributed sequential lock can be used from accross processes and threads.

My new invention called distributed sequential lock beats the following algorithm, it is more scalable and faster than the following algorithm of Dmitry Vyukov called Distributed reader-writer mutex ( i have just explained to you why ..):

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

And my new invention called distributed \ sequential lock beats also Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wiki/Seqlock

And my new invention called distributed sequential lock can replace RCU cause it is a powerful synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential lock that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and by looking first at the weakness of the following algorithm of Dmitry Vyukov's Reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

look at the writer side of the Dmitry Vykov's algorithm , it is doing with every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

pthread_rwlock_wrlock(&mtx-cell.mtx);

but this render this algorithm inefficient for the writer side , cause

every pthread_rwlock_wrlock() call transfer many cache-lines betwenn cores, so this will generate too much cache line transfers between cores when each writer executes distr_rw_mutex_wrlock() and this will become worse and worse when you add more and more cores, i mean this will tranfer more and more cache-lines between cores with more and more cores... so this will render the writer side throughput slow and this is the weakness of this algorithm...

So this is why i have used a sequential lock mechanism in the writer side of my new algorithm so that to minimize efficiently the cache-lines transfers, so if you look at my algorithm, i am doing this inside the WLock() function on the writer side:

if (FCount6^.fcount6 mod GetSystemThreadCount) = 0 then FCount5^.fcount5:=1;

FCount6^.fcount6 is a counter and i have applied to it a modulo of GetSystemThreadCount(), and GetSystemThreadCount() will returnthe number of cores, so every "number of cores" calls to WLock() , my new algorithm switches from a sequential lock to a distributed lock so that to eliminate the "livelock" and "starvation" weaknesses of Seqlock, cause Seqlock is a lockfree mechanism that can starve threads and that can livelock when there is more writers , cause Seqlock gives preference to writers threads over reader threads, but my new invention of a scalable distributed sequential lock eliminates "livelock" and "starvation" of Seqlock when there is more writers cause it switches to a distributed lock on every "number of cores" calls to the WLock() method , so this is why my new invention beats Seqlock, and my new invention beats also the Dmitry's Reader-writer mutex cause it switches to a distributed lock on every number of cores calls to WLock(), so in the remaining calls it switches to a sequential lock that render the writer side faster cause it generate much less cache-lines tranfers than the Dmitry Vyukov's distributed reader-writer mutex. On the reader side of my algorithm i am using a Sequential lock mechanism. And i think that since my new algorithm of a scalable distributed sequential lock is very powerful, so i think it can even replace RCU.

About the sequential consistency of  my scalable distributed sequential lock, my algorithm works on x86 architecture and i think my algorithm is correct cause look at the source code of the WLock() method, since i am using a Ticket spinlock with a proportional backoff on the writer side, the Ticket spinlock is using a "lock add" assembler instruction to increment a counter in the Enter() method of the ticket spinlock , and 
this "lock add" assembler instruction is a barrier for stores and loads on x86, so the WLock() method is sequential consistent and correct, now look at the WUnlock() , we don't need an "sfence" cause stores are not 
reordered  with stores on x86 , so WUnlock() method is sequential consistent and correct, now look at the RLock() method, since i am using a "lock add" assembler instruction inside the call to the RLock() of the 
distributed reader-writer lock , so it acts as a barrier and loads and stores inside the reader section are not reordered with this barrier on x86, so RLock() is sequential consistent and correct, and for RUnlock() 
since i am calling RUnlock() of the distributed reader-writer mutex , this RUnlock() is calling "lock add" in assembler, so it acts as a barrier so the the loads and stores that we are using after this barrier 
are not reordrered with this barrier, so all in all my algorithm is sequential consistent and correct on x86. So be confident cause i have reasoned correctly and i think my algorithm is correct and it is a powerful synchronization mechanism that can replace RCU and that can replace Seqlock cause it beats Seqlock.

Disclaimer:

My software is provided on an "as-is" basis, with no warranties,
express or implied.  The entire risk and liability of using it is yours. 
Any damages resulting from the use or misuse of this software will be 
the responsibility of the user.


You can download my scalable distributed sequential lock version 1.01 from:

https://sites.google.com/site/aminer68/scalable-distributed-sequential-lock

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Thank you,

Amine Moulay Ramdane.