Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

high-speed single-producer/consumer queue impl

Chris_M__Thomasson
New Contributor I
367 Views

Here are some snippets of code from my new library that implement a simple and efficient lock-free method for high-speed thread-to-thread communication. Its a simple single-producer/consumer queue algorithm implemented in i686 assembly. You can easily convert it into a multi-producer/consumer queue by protecting it with two separatemutexs; one for the push function and one for the pop function. Unfortunately, adding mutexs gets rid of the lock-free nature of the queue and introduces the kernel. However, the setup allows for "concurrent pushes and pops" because youre using two mutexs to emulate a single producer/consumer environment. In other words a producer does not block a consumer and vise versa. Enjoy! ;)

Any comments or questions?

0 Kudos
7 Replies
Chris_M__Thomasson
New Contributor I
367 Views
An atomic operations api has been added to AppCore.
0 Kudos
Chris_M__Thomasson
New Contributor I
367 Views
There was a stupid memory leak wrt tls node cache due to a stupid cut&paste
error. It's fixed. Probably should redownload. The memory leak will cause
the simple malloc counter to assert saying there is a leak if you create a
tls for the main thread, cache some nodes, and call ac_shutdown(). The
posted test does not trip the leak because it doesn't create a tls for the
main thread.
Sorry!
0 Kudos
Chris_M__Thomasson
New Contributor I
367 Views
Afull-blown hazard-pointer implementation was added to AppCore. There is also an eventcount, lock-free stack, and a rw-spinlock ( not Joes
algorithm ). I still need to tweak the memory barriers in the eventcount,
but since its for i686 everything should work fine. I am currently cleaning
up the site and adding some simple build instructions. I am planning to post
a MSVC workspace, Dev-Cpp project, and Anjuta project for Linux to make
AppCore super easy to build. IDE's are sort of nice in that respect. Any
questions or comments?
0 Kudos
Chris_M__Thomasson
New Contributor I
367 Views
Updated the assembly files. Redownload.
0 Kudos
ClayB
New Contributor I
367 Views
Lockfree -
I appreciate your posting links to your software and papers to educate the forum readers on lockfree methods. I've learned quite a bit.
Have you ever benchmarked or seen reports of performance for lockfree techniques (yours or others) to equivalent locking methods? I'm hoping you might have some "feel" or intuition (possibly backed up with some experience) that would give an idea of how valuable lockfree methods can be with regards to performance in threaded applications.Is there any benefit for using (more complex) lockfree methods over standard locking?
--clay
0 Kudos
Chris_M__Thomasson
New Contributor I
367 Views
ClayB wrote:
Lockfree -
I appreciate your posting links to your software and papers to educate the forum readers on lockfree methods. I've learned quite a bit.

Youre welcome! :)


Have you ever benchmarked or seen reports of performance for lockfree techniques (yours or others) to equivalent locking methods?

Yes. I havedoneawhole lotof testing.I am planning on posting some results andvarioustest applications so you can see for yourself. Some of the lock-based tests can end up taking so long tocomplete that you just ctrl-c the test case. Try to put a lock-based solution up against a lock-free single-producer/consumer queue... You should quickly see the major difference. The lock-based version will be using atomic operations for the mutex implementation, and also employ the kernel for contention. Thats a ton of unnecessary overhead. The lock-free version will use simple loads and stores /w some memory barriers and avoid the kernel completely. Its a lot more efficient...

Also, try comparing a lock-based referencecounttoa lock-free reference count... The lock-based version dies. The lock itself requires at least two atomic operations for a lock-unlock cycle, plus it uses the kernel for contention. The lock-free version requires a single atomic operation to modify the count, and the kernel is totally avoided.


I'm hoping you might have some "feel" or intuition (possibly backed up with some experience) that would give an idea of how valuable lockfree methods can be with regards to performance in threaded applications.Is there any benefit for using (more complex) lockfree methods over standard locking?

Yes. Therecanbemany benefits.

1. If one thread fails is means that another thread has made forward progress

2. Avoids lock convoy and priority inversion

3. Can be immune to thread failures

3. Some lock-free stuff can be used in a signal handler

4. Fewer atomic ops than lock-based

5. Avoids the kernel in user-space applications

6. Can allow for reads while there are concurrent writes in a shared collection

Take a look at these threads:

http://groups-beta.google.com/group/comp. programming.threads/msg/6b2ccf76ba145c9c
Does that help?
0 Kudos
Chris_M__Thomasson
New Contributor I
367 Views

During my last update to the web server, I screwed up the zip file by added
old assembly files. Its fixed now!

Please re-download.

Sorry!

:O


lockfree wrote:

Here are some snippets of code from my new library that implement a simple and efficient lock-free method for high-speed thread-to-thread communication. Its a simple single-producer/consumer queue algorithm implemented in i686 assembly. You can easily convert it into a multi-producer/consumer queue by protecting it with two separatemutexs; one for the push function and one for the pop function. Unfortunately, adding mutexs gets rid of the lock-free nature of the queue and introduces the kernel. However, the setup allows for "concurrent pushes and pops" because youre using two mutexs to emulate a single producer/consumer environment. In other words a producer does not block a consumer and vise versa. Enjoy! ;)

Any comments or questions?




0 Kudos
Reply