topic Re: high-speed single-producer/consumer queue impl in Intel® Moderncode for Parallel Architectures

high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Sat, 12 Feb 2005 16:01:40 GMT

Here are some snippets of code from my new library that implement a simple and efficient lock-free method for high-speed thread-to-thread communication. Its a simple single-producer/consumer queue algorithm implemented in i686 assembly. You can easily convert it into a multi-producer/consumer queue by protecting it with two separatemutexs; one for the push function and one for the pop function. Unfortunately, adding mutexs gets rid of the lock-free nature of the queue and introduces the kernel. However, the setup allows for "concurrent pushes and pops" because youre using two mutexs to emulate a single producer/consumer environment. In other words a producer does not block a consumer and vise versa. Enjoy! ;)

Any comments or questions?

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Mon, 21 Feb 2005 21:29:34 GMT

An atomic operations api has been added to AppCore.

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Tue, 22 Feb 2005 12:42:15 GMT

There was a stupid memory leak wrt tls node cache due to a stupid cut&paste
error. It's fixed. Probably should redownload. The memory leak will cause
the simple malloc counter to assert saying there is a leak if you create a
tls for the main thread, cache some nodes, and call ac_shutdown(). The
posted test does not trip the leak because it doesn't create a tls for the
main thread.

Sorry!

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Tue, 15 Mar 2005 06:23:16 GMT

Afull-blown hazard-pointer implementation was added to AppCore. There is also an eventcount, lock-free stack, and a rw-spinlock ( not Joes
algorithm ). I still need to tweak the memory barriers in the eventcount,
but since its for i686 everything should work fine. I am currently cleaning
up the site and adding some simple build instructions. I am planning to post
a MSVC workspace, Dev-Cpp project, and Anjuta project for Linux to make
AppCore super easy to build. IDE's are sort of nice in that respect. Any
questions or comments?

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Tue, 15 Mar 2005 09:17:47 GMT

Updated the assembly files. Redownload.

Re: high-speed single-producer/consumer queue impl

ClayB — Thu, 17 Mar 2005 06:16:05 GMT

Lockfree -

I appreciate your posting links to your software and papers to educate the forum readers on lockfree methods. I've learned quite a bit.

Have you ever benchmarked or seen reports of performance for lockfree techniques (yours or others) to equivalent locking methods? I'm hoping you might have some "feel" or intuition (possibly backed up with some experience) that would give an idea of how valuable lockfree methods can be with regards to performance in threaded applications.Is there any benefit for using (more complex) lockfree methods over standard locking?

--clay

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Fri, 25 Mar 2005 20:14:43 GMT

ClayB wrote:

Lockfree -

I appreciate your posting links to your software and papers to educate the forum readers on lockfree methods. I've learned quite a bit.

Youre welcome! :)

Have you ever benchmarked or seen reports of performance for lockfree techniques (yours or others) to equivalent locking methods?

Yes. I havedoneawhole lotof testing.I am planning on posting some results andvarioustest applications so you can see for yourself. Some of the lock-based tests can end up taking so long tocomplete that you just ctrl-c the test case. Try to put a lock-based solution up against a lock-free single-producer/consumer queue... You should quickly see the major difference. The lock-based version will be using atomic operations for the mutex implementation, and also employ the kernel for contention. Thats a ton of unnecessary overhead. The lock-free version will use simple loads and stores /w some memory barriers and avoid the kernel completely. Its a lot more efficient...

Also, try comparing a lock-based referencecounttoa lock-free reference count... The lock-based version dies. The lock itself requires at least two atomic operations for a lock-unlock cycle, plus it uses the kernel for contention. The lock-free version requires a single atomic operation to modify the count, and the kernel is totally avoided.

I'm hoping you might have some "feel" or intuition (possibly backed up with some experience) that would give an idea of how valuable lockfree methods can be with regards to performance in threaded applications.Is there any benefit for using (more complex) lockfree methods over standard locking?

Yes. Therecanbemany benefits.

1. If one thread fails is means that another thread has made forward progress

2. Avoids lock convoy and priority inversion

3. Can be immune to thread failures

3. Some lock-free stuff can be used in a signal handler

4. Fewer atomic ops than lock-based

5. Avoids the kernel in user-space applications

6. Can allow for reads while there are concurrent writes in a shared collection

Take a look at these threads:

http://groups.google.de/groups?group=comp.programming.threads&threadm=ec1c3924.0410171103.568fa38a%40posting.google.com

http://groups-beta.google.com/group/comp.programming.threads/msg/7e7834ca10f2613a

http://groups-beta.google.com/group/comp. programming.threads/msg/6b2ccf76ba145c9c

http://groups-beta.google.com/group/comp.programming.threads/msg/da843ddf7e139098

Does that help?

Re: high-speed single-producer/consumer queue impl

Chris_M__Thomasson — Sat, 26 Mar 2005 03:50:39 GMT

During my last update to the web server, I screwed up the zip file by added
old assembly files. Its fixed now!

Please re-download.

Sorry!

lockfree wrote:

Here are some snippets of code from my new library that implement a simple and efficient lock-free method for high-speed thread-to-thread communication. Its a simple single-producer/consumer queue algorithm implemented in i686 assembly. You can easily convert it into a multi-producer/consumer queue by protecting it with two separatemutexs; one for the push function and one for the pop function. Unfortunately, adding mutexs gets rid of the lock-free nature of the queue and introduces the kernel. However, the setup allows for "concurrent pushes and pops" because youre using two mutexs to emulate a single producer/consumer environment. In other words a producer does not block a consumer and vise versa. Enjoy! ;)

Any comments or questions?