Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

IDS Article: Multiple Approaches to Multithreaded Applications

ClayB
New Contributor I
362 Views
Intel Developer Services has posted an article on threaded programming (see http://www.intel.com/cd/ids/developer/asmo-na/eng/151201.htm) titled "Multiple Approaches to Multithreaded Applications." The topic focuses on use of locks and lock-free methods, which makes the title a bit misleading, IMO.
Has any else read this article? Since there is no feedback or rating mechanism attached to the article, if you have read the article, please comment here. Are there different or better methods than those presented in the paper? Should performance merit any consideration when choosing one implementation over any other (e.g., locks vs. semaphores)to perform the same function?
--clay
0 Kudos
5 Replies
tpa_10
Beginner
362 Views
I am surprised that the concept of avoiding locks appears to be creating any kind of "stir": the use of atomic updates has been a tried and true programming practice for many years.
Since most of the locking functions commonly available are implemented in terms of CAS and its counterparts, it just seems that some people are re-discovering these primitives and that there are times when it pays off to use them.
It would be a welcome addition to have support for them moved into the "mainstream" libraries and compilers. :)
0 Kudos
Chris_M__Thomasson
New Contributor I
362 Views

ClayB wrote:

Intel Developer Services has posted an article on threaded programming (see http://www.intel.com/cd/ids/developer/asmo-na/eng/151201.htm) titled "Multiple Approaches to Multithreaded Applications." The topic focuses on use of locks and lock-free methods, which makes the title a bit misleading, IMO.

----------------------------------

The article references my site, its a good thing that it has correct code posted!

BTW, I am SenderX on comp.programming.threads.

There is a flaw in the paper Its example of a lock-free LIFO stack suffers from a horrible race-condition known as ABA. You need to atomically increment a version counter on every successful pop operation, and the counter needs to be maintained throughout the popped nodes lifetime for correctness. Also, there is a reference to a French paper, it is flawed as well. I think it can suffer from ABA in the first cas of the lock-free queue push function. It needs to be a double-wide cas.

-----------------------------

Are there different or better methods than those presented in the paper?

------------------------------

Yes. Kernel or User-Space RCU, Joes atomic_ptr, my proxy gc, SMR

I can post some links if you want.

------------------------------

Should performance merit any consideration when choosing one implementation over any other (e.g., locks vs. semaphores)to perform the same function?

-------------------------------

It depends if you want your thread-to-thread communication to be heavily fast-pathed and potentially block only when it absolutely needs to? Linux FUTEX is an example of this. A fast-path can be equated to a lock-free path, because the call into the kernel is avoided.

Sometimes its good to use locks. Like RCU It relies on locks for the write side( call_rcu ) of the algorithm.

Anyway, I will get into this in a lot more detail when I finish up my portable lock-free library. It will run on pthread enabled i686 UNIXs. It can easily be ported to 32-bit and 64-bit SPARC and PPC. It can even port to AMD64 and Itanium.

The library will be about as portable as the Linux FUTEX API. My new stuff really shows off how SMP and/or HyperThreading can really increase the throughput of thread-to-thread communication.

:)

0 Kudos
Chris_M__Thomasson
New Contributor I
362 Views
There is one more error in the paper. The double-checked locking is broken. The paper does not even try to address memory visibility.
:smileysurprised:
If your going to do DCL, you need to use the proper atomic ops mixed with the correct memory barriers. Also, you "need to know" your biatch compiler inside in out. It can really screw you badwhen it comes to this stuff.
Using 100%assembler for DCL like algorithms helps ease your mind a great deal!
:smileywink:
0 Kudos
Chris_M__Thomasson
New Contributor I
362 Views


If your going to do DCL, you need to use the proper atomic ops mixed with the correct memory barriers. Also, you "need to know" your biatch compiler inside in out. It can really screw you
badwhen it comes to this stuff.


'your "biatch" compiler inside in out.'

I meant, your biatch compiler!

0 Kudos
Chris_M__Thomasson
New Contributor I
362 Views
WTF???
What the heck is wrong with this forum software!!!!!!!!
This crap is messing with my messages. It changed the word " d_a_m_n "into that stupid "biatch" thing.
WHY?
Somebody else needs to try this...
Post a message with theword " d_a_m_n "in it, and omit the underscores.
See what actually gets posted?!?
0 Kudos
Reply