Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

Please suggest a good starting book for multithreaded programming....

aminer10
Novice
202 Views


Hello,

On Mar 30, 6:24 pm, MC <manan.cho...@gmail.com> wrote:
> Dear all,
> Following on the post of Srinu. I am very beginner in multithreaded
> programming. I have been looking for a good book to read about the
> basic concepts of mutithreading, I recently bought Programming with
> POSIX threads- by Butenhof. I didnt quite like that book, what I am
> looking for a is a book which explains multithreaded programming
> conceptually and also gives good concrete examples. Can anybody please
> suggest me a book.
>
> Thanks,

I will just give an advice...

To learn more about parallel programming, just read the old posts
in comp.programming.threads and the other forums that discuss
parallel programming.. read them carefully - as i did myself -
and try to use LOGIC and REASON about them and try to EXTRACT
the good patterns about parallel programming from them and understand
them...

Also, try to look at the parallel codes - example

http://pages.videotron.com/aminer/ and other parallel toolkits ...-

and extract those good patterns about parallel programming from
those programming codes and understand them..

Good patterns about parallel programming are like theorems:
IF predicates are meet THEN something...

Once you have extracted those good patterns and understood them ,
you will be more and more professional in parallel programming...


Read also this:

http://pages.videotron.com/aminer/threadpool.htm

And try to extract and undertand those good patterns - like theorems
-
about Parallel Programming and read this also:

http://pages.videotron.com/aminer/parallelhashlist/queue.htm

And try to extract and undertand those good patterns - like theorems
- about Parallel Programming

And read inside my parallel code:

http://pages.videotron.com/aminer/

and the parallel code of others...

and try to 'EXTRACT' and 'UNDERTAND' those good patterns
to follow...

Good patterns about parallel programming are like theorems:
IF predicates are meet THEN something...

As an example, take the following page:

http://blogs.msdn.com/visualizeparallel/

As i said before, good patterns about parallel programming
are like theorems: IF predicates are meet THEN something...
So, read this:

"It is critical to be able to spot data parallelism when you see it
because data parallel algorithms allow the developer to more easily
construct efficient and safe code. As opposed to the more complex
solutions employed against task parallelism, data parallelism allows
the programmer to perform the same operation on each piece of data
concurrently without concern for race conditions and consequently,
the need for synchronization, which results in significant
performance overhead. Arguably, data parallel algorithms perform
better (due to the lack of synchronization) and are easier for the
developer to implement."

So, tell me MC, what can you EXTRACT from this ?

You can extract something like a theorem to follow, like this:

[1] IF your algorithm exhibit much more data parallelism THEN
it will be much more effcient - it will perform better- due to the
lack of sychronization...

Hence, if you follow theorem [1]: it will be a good pattern in
parallel programming - to follow -.


Do you undersand now ?

You have to be smart and start to extract those theorems
- good patterns to follow... - from all the programming codes,

articles and forums etc.


Take care...


Sincerely,
Amine Moulay Ramdane.

0 Kudos
6 Replies
aminer10
Novice
202 Views


I wrote;

>As an example, take the following page:

>http://blogs.msdn.com/visualizeparallel/


>As i said before, good patterns about parallel programming
>are like theorems: IF predicates are meet THEN something...

>So, read this:

>"It is critical to be able to spot data parallelism when you see it
>because data parallel algorithms allow the developer to more easily
>construct efficient and safe code. As opposed to the more complex
>solutions employed against task parallelism, data parallelism allows
>the programmer to perform the same operation on each piece of data
>concurrently without concern for race conditions and consequently,
>the need for synchronization, which results in significant
>performance overhead. Arguably, data parallel algorithms perform
>better (due to the lack of synchronization) and are easier for the developer
>to implement."


>So, tell me MC, what can you EXTRACT from this ?


>You can extract something like a theorem to follow, like this:

>[1] IF your algorithm exhibit much more data parallelism THEN
>it will be much more effcient - it will perform better- due to the
>lack of sychronization...

>Hence, if you follow theorem [1]: it will be a good pattern in
>parallel programming - to follow -.


So, this theorem that i have extracted from the page is important,
and it's a good pattern to follow...


How can this theorem be understood by using mathematical equations ?

Easy , if your algorithm exhibit more data parallelism THEN the
proportion of S will be smaller in the Amadhal equation:
1 / (S + P/N) - N: is the number of processors - hence ,
the algorithm will scale better...

And as you have noticed , this is what have stated theorem [1]:

" [1] IF your algorithm exhibit much more data parallelism THEN
it will be much more effcient - it will perform better- due to the
lack of sychronization..."


That's the same for the other theorems: on deadlock, false sharing etc.

You have to be smart and start to extract those theorems
- good patterns to follow... - from all the programming codes,
articles and forums etc.


Sincerely,
Amine Moulay Ramdane.

0 Kudos
aminer10
Novice
202 Views

I wrote:
>Theorem:

> [1] IF your algorithm exhibit much more data parallelism
THEN it will be much more effIcient.

>How can this theorem be understood by using mathematical equations ?
>Easy , if your algorithm exhibit more data parallelism THEN the
>proportion of S will be smaller in the Amadhal equation:
>1 / (S + P/N) - N: is the number of processors - hence ,
>the algorithm will scale better...


Finally, i will state this:

IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns , like theorem [1] -
THEN your will construct a model that will be much more CORRECT
and EFFICIENT.

It is one of my preferred methodology in programming.

Sincerely,
Amine Moulay Ramdane.

0 Kudos
aminer10
Novice
202 Views


I have wrote:

>Read also this:
>http://pages.videotron.com/aminer/threadpool.htm
>And try to extract and undertand those good patterns - like theorems

If you look carefully at this page:

http://pages.videotron.com/aminer/threadpool.htm



What can you extract from this page ?


Read for example this, i wrote:

"It uses a lock-free queue for each worker thread and it uses
work-stealing - for more efficiency "

It's like a law - or a theorem - to follow ..


IF you use, in a Thread Pool Engine, a lock-free queue for
each worker thread , that means that there is LESS contention,
and, that means also that the percentage S in the Amadhal law
equation 1 / (S + P/N) - N: is the number of cores/processors - is smaller,
THEN or HENCE, the algorithm will scale better...

So, as you have noticed, it's like theorems that i have
followed in my methodology and models.

And of course, work-stealing provides implicitly Load Balancing ,
so, it's more efficient...

There is another theorem also that can be stated like this,

[2] IF two or more processes or threads use the same critical
sections THEN they- the processes or threads - must take
them in the same order toavoid deadlock - in the system - .


That's also another theorem to follow...

etc.

And as i have stated before:

IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns , like theorem [1] or [2] ... -
THEN your will construct a model that will be much more CORRECT
and EFFICIENT.

It is one of my preferred methodology in programming.


Sincerely.
Amine Moulay Ramdane.


0 Kudos
aminer10
Novice
202 Views


Hello gain,


If you take a look inside the Object Pascal code of my
ParallelQueue.pas - you can download lock-free ParallelQueue from

http://pages.videotron.com/aminer/-

You will notice thati am using many queues etc. and i am
using a hash based method to MINIMIZE contention.

And as i saidand stated before, andthis is a law or theorem
to apply:

[3] If there is LESS contentionTHEN the algorithmwill
scale better. Due to the fact thatS(the serial part)become
smallerwithlesscontention , and as N become bigger, the
result - thespeedof the program/algorithm... - of theAmadhal
equation 1/(S+(P/N)) become bigger.


But does Theorem [3]works in reality ?

Of course yes !


Look at the pop() performance of my lock-free ParallelQueue ,
it does scale better than flqueue and Ringbuffer


http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm


And as i have stated before:

IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns , like theorem [1], [2],[3]... -
THEN your will construct a model that will be much more CORRECT
and EFFICIENT.

And it is one of my preferred methodology in programming.


Sincerely.
Amine Moulay Ramdane.


0 Kudos
aminer10
Novice
202 Views


Hello again,

Sorry for my english , but i will continu to explain - my ideas etc. -
using logic and reasonning...

As you already know, we have those two notions:

'Time' - we have time cause there is movement of matter -

and

'Space'


And we have those two notions that we call 'Correctness' and 'Efficiency'

And . as you have noticed, i have stated the following theorems...

[1] IF your algorithm exhibit much more data parallelism THEN
it will be much more efficient.

2] IF two or more processes or threads use the same critical
sections THEN they - the processes or threads - must take
them in the same order to avoid deadlock - in the system - .

3] If there is LESS contention THEN the algorithm will
scale better. Due to the fact that S (the serial part)
become smaller with less contention , and as N become bigger,
the result - the speed of the program/algorithm... - of the
Amdahl's equation 1/(S+(P/N)) become bigger.

[4] You have latency and bandwidth , so, IF you use efficiently
one or both of them - latency and bandwidth - THEN your algorithm
will be more efficient.

etc.


Why am i calling them theorems ?

You can also call them rules or true propositions, laws ...

Now i can 'classify' theorem [2] in the set that i call 'correctness',
and it states something on correctness..

And theorems [1] [3] [4] in the set that i call 'efficiency'.

, and they states something on efficiency.


But you have to be smart now..


If you have noticed, theorem [2] and [3] are in fact
the same as theorem [4]

But why i am calling them theorems ?

You can call them rules,laws... if you want.


And as i have stated before:

IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns - like rules or theorems

[1] , [2] , [3], [4]... - THEN your will construct a model that will be much more

CORRECT and EFFICIENT.

It is one of my preferred methodology in programming.

Sincerely,

Amine Moulay Ramdane.

0 Kudos
aminer10
Novice
202 Views


Hello,

I am still thinking and using logic...

I can add the following rules also:

[5] IF you are using a critical section or spinlock and there is
a high contention- with many threads - on them THEN there is a
possibility of a Lock convoy. Due to the fact that the thread
entering the spinlock or critical section may context switch
and this will add to the service time - and to the S (serial part)
of the Amdahl's equation - and this will higher the contention and
create a possibility of a Lock convoy and to a bad scalability.

We can elevate the problem in [5] by using a Mutex or a Semaphore
around the crital section or the spinlock...

Another rule now..


[6] If there is contention on a lock - a critical section ... -
and inside the locked sections you are the I/O - example
loging a message to a file - this will lead the calling thread
block on the I/O and the operating system will deschedule
the blocked thread until the I/O completes, thus this situation
will lead to more context switching, and therefore to an increased
service time , and longer service times, in this case, means
more lock contention, and more lock contention means a bad
scalability.

there is also false sharing etc.


IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns - like rules or
theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will construct a model
that will be much more CORRECT and EFFICIENT.

And it is one of my preferred methodology in programming.

I will try to add more of those rules , theorems , laws...
next time...

Sincerely,
Amine Moulay Ramdane.

0 Kudos
Reply