<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic A good methodology ... in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854559#M2020</link>
    <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned my previous posts, and here is my new post &lt;BR /&gt;that includes my 'ideas' etc...&lt;/P&gt;&lt;P&gt;Srinivas Nayak wrote in comp.programming.threads: &lt;BR /&gt;&amp;gt;Dear All, &lt;BR /&gt;&amp;gt;Please suggest a good book that teaches in great details about the &lt;BR /&gt;&amp;gt;theories behind the followings. &lt;BR /&gt;&amp;gt;1. shared memory concurrent systems. &lt;BR /&gt;&amp;gt;2. message passing concurrent systems. &lt;BR /&gt;&amp;gt;3. mutual exclusion. &lt;BR /&gt;&amp;gt;4. synchronization. &lt;BR /&gt;&amp;gt;5. safety property. &lt;BR /&gt;&amp;gt;6. liveness property. &lt;BR /&gt;&amp;gt;7. fairness property. &lt;BR /&gt;&amp;gt;8. systems with code interleaving (virtual concurrency). &lt;BR /&gt;&amp;gt;9. systems with no code interleaving (true concurrency). &lt;BR /&gt;&amp;gt;10. atomic operations. &lt;BR /&gt;&amp;gt;11. critical sections. &lt;BR /&gt;&amp;gt;12. how to code a concurrent system (about programming language &lt;BR /&gt;&amp;gt;constructs available for it). &lt;BR /&gt;&amp;gt;13. how to mathematically proof the properties. &lt;BR /&gt;&amp;gt;14. how to mechanically verify the properties. &lt;BR /&gt;&amp;gt;15. blocking synchronization. &lt;BR /&gt;&amp;gt;16. non-blocking synchronization. &lt;BR /&gt;&amp;gt;17. lock-freedom. &lt;BR /&gt;&amp;gt;18. wait-freedom. &lt;BR /&gt;&amp;gt;19. deadlock-freedom. &lt;BR /&gt;&amp;gt;20. starvation-freedom. &lt;BR /&gt;&amp;gt;21. livelock-freedom. &lt;BR /&gt;&amp;gt;22. obstruction-freedom. &lt;BR /&gt;&amp;gt;Not only the concepts but also that teaches with very simple &lt;BR /&gt;&amp;gt;mathematical treatment; axiomatic or linear temporal logic. &lt;BR /&gt;&amp;gt;Many of the books I came across are either emphasize one or two topic &lt;BR /&gt;&amp;gt;or just provides a conceptual treatment, without mentioning how to &lt;BR /&gt;&amp;gt;code a concurrent system, check if it is mathematically or manually &lt;BR /&gt;&amp;gt;correct. &lt;BR /&gt;&amp;gt;Please suggest any book or paper where these topics are &lt;BR /&gt;&amp;gt;comprehensively covered in great details. Better if all these are &lt;BR /&gt;&amp;gt;under a single cover that will be easy to understand under the roof of &lt;BR /&gt;&amp;gt;a unifying theory. &lt;BR /&gt;&amp;gt;Survey papers of these are also welcome. &lt;BR /&gt;&amp;gt;With regards, &lt;BR /&gt;&amp;gt;Srinivas Nayak &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;For boundedness and deadlocks... - one of the most &lt;BR /&gt;important properties .. you can use petri nets and reason &lt;BR /&gt;about place invariants equations that you extract from the &lt;BR /&gt;resolution of the following equation: &lt;/P&gt;&lt;P&gt;Transpose(vector) * Incidence matrix = 0 &lt;/P&gt;&lt;P&gt;and find your vectors...on wich you wil base your reasonning... &lt;/P&gt;&lt;P&gt;you can do the same - place invariants equations... - and &lt;BR /&gt;reason about lock and lock-free algorithms... &lt;/P&gt;&lt;P&gt;And you can use also graph reduction techniques... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example , suppose that you resolve your equation &lt;BR /&gt;Transpose(vector) * Incidence matrix = 0 and find the &lt;BR /&gt;following equations &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Note: P,Q,S,R are all the places in the system... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;equations1: 2 * M(P) M(Q) + M(S) = C1 (constant) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;equestion2: M(P) + M + M(S) = C2 (constant) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Note also that vector f * M0 (initial marking) = 0 &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, it follows - from the equations - that since &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;M(P) + M + M(S) = C1 , it means that &lt;BR /&gt;M(P) &amp;lt;= C1 and M &amp;lt;= C1 and M(S) &amp;lt;= C1 &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and, from the second invariant equation , we have &lt;BR /&gt;that M(Q) &amp;lt;= C2 , this IMPLY that the system is &lt;BR /&gt;structuraly bounded. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's the same thing for deadlocks , you reason &lt;BR /&gt;about invariants equations to proove that there is &lt;BR /&gt;no deadlock in the system... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now, if you follow good patterns , that's also good... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And what's a good pattern ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's like a THEOREM that we apply in programming... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Suppose that or IF - we have two threads that want to aquire &lt;BR /&gt;crititical sections, IF the first thread try to aquire critical &lt;BR /&gt;section A and &lt;BR /&gt;after that critical section B, and the second threads try to &lt;BR /&gt;aquire B and A THEN you can have a deadlock in this system. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;you see ? it look like this: IF predicates are meet THEN &lt;BR /&gt;somethings ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now suppose there is many criticals sections... and the first &lt;BR /&gt;thread try to aquire A ,B ,C ,D,E,F,G and second thread try to &lt;BR /&gt;aquire A,G,C,D,E,F,B that's also a problem ... you can &lt;BR /&gt;easily notice it by APPLYING the theorems that we call &lt;BR /&gt;'good patterns'. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You see why good patterns - that looks like theorems - &lt;BR /&gt;are also very powerfull ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's what we call a good pattern - it's like a theorem , &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and it looks like this: IF predicates are meet THEN somethings ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;There is also good patterns - like theorems - to follow for false &lt;BR /&gt;sharing etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you understand why I and others follow also good patterns &lt;BR /&gt;- that look like theorems - ? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MC also wrote: &lt;BR /&gt;&amp;gt; Dear all, &lt;BR /&gt;&amp;gt; Following on the post of Srinu. I am very beginner in multithreaded &lt;BR /&gt;&amp;gt; programming. I have been looking for a good book to read about the &lt;BR /&gt;&amp;gt; basic concepts of mutithreading, I recently bought Programming with &lt;BR /&gt;&amp;gt; POSIX threads- by Butenhof. I didnt quite like that book, what I am &lt;BR /&gt;&amp;gt; looking for a is a book which explains multithreaded programming &lt;BR /&gt;&amp;gt; conceptually and also gives good concrete examples. Can anybody please &lt;BR /&gt;&amp;gt; suggest me a book. &lt;/P&gt;&lt;P&gt;&amp;gt; Thanks, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will just give an advice... &lt;/P&gt;&lt;P&gt;To learn more about parallel programming, just read the old posts &lt;BR /&gt;in comp.programming.threads and the other forums that discuss &lt;BR /&gt;parallel programming.. read them carefully - as i did myself - &lt;BR /&gt;and try to use LOGIC and REASON about them and try to EXTRACT &lt;BR /&gt;the good patterns about parallel programming from them and understand &lt;BR /&gt;them... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also, try to look at the parallel codes - example &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; and other parallel toolkits ...- &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/threadpool.htm"&gt;http://pages.videotron.com/aminer/threadpool.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelhashlist/queue.htm"&gt;http://pages.videotron.com/aminer/parallelhashlist/queue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;and read inside my parallel code:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and the parallel code of others... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and try to 'EXTRACT' and 'UNDERTAND' those good patterns &lt;BR /&gt;to follow... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Good patterns about parallel programming are like theorems: &lt;BR /&gt;IF predicates are meet THEN something... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example, take the following page: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://blogs.msdn.com/visualizeparallel/"&gt;http://blogs.msdn.com/visualizeparallel/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As i said before, good patterns about parallel programming &lt;BR /&gt;are like theorems: IF predicates are meet THEN something... &lt;BR /&gt;So, read this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"It is critical to be able to spot data parallelism when you see it &lt;BR /&gt;because data parallel algorithms allow the developer to more easily &lt;BR /&gt;construct efficient and safe code. As opposed to the more complex &lt;BR /&gt;solutions employed against task parallelism, data parallelism allows &lt;BR /&gt;the programmer to perform the same operation on each piece of data &lt;BR /&gt;concurrently without concern for race conditions and consequently, &lt;BR /&gt;the need for synchronization, which results in significant &lt;BR /&gt;performance overhead. Arguably, data parallel algorithms perform &lt;BR /&gt;better (due to the lack of synchronization) and are easier for the &lt;BR /&gt;developer to implement." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, tell me MC, what can you EXTRACT from this ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can extract something like a theorem to follow, like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt; it will be much more effcient - it will perform better- due to &lt;BR /&gt; the lack of sychronization... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Hence, if you follow theorem [1]: it will be a good pattern in &lt;BR /&gt;parallel programming - to follow -. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you undersand now ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart and start to extract those theorems &lt;BR /&gt;- good patterns to follow... - from all the programming codes, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, this theorem that i have extracted from the page is important, &lt;BR /&gt;and it's a good pattern to follow... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;How can this theorem be understood by using mathematical equations ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Easy... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If your algorithm exhibit more data parallelism THEN the proportion &lt;BR /&gt;S - in percentage - will be smaller in the Amdahl equation: &lt;BR /&gt;1 / (S + (P/N)) - N: is the number of cores/processors - hence , the &lt;BR /&gt;algorithm will scale better... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as you have noticed , this is what have stated theorem [1]: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;" [1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt; it will be much more effcient - it will perform better- due to &lt;BR /&gt; the lack of sychronization..." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's the same for the other theorems: on deadlock, false sharing &lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart and start to extract those theorems &lt;BR /&gt;- good patterns to follow... - from all the programming codes, &lt;BR /&gt;articles and forums etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck also wrote: &lt;BR /&gt;&amp;gt; What if people wanna roll there own versions ? ;) &lt;BR /&gt;&amp;gt; They would much better be "served" by algorithms/pseudo &lt;BR /&gt;&amp;gt; code than real code which could be system/language specific ;) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's easy to EXTRACT algorithms from Object Pascal code... &lt;/P&gt;&lt;P&gt;Look for example inside pbzip.pas, i am using this in the &lt;BR /&gt;main body of my program: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;name:='msvcr100.dll'; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's the 'test' file that i am using - it's inside the &lt;BR /&gt;zip file also - once you compile and execute pbzip.pas it &lt;BR /&gt;will generate a file msvcr100.dll.bz. And as you have &lt;BR /&gt;noticed i am using a - portable - compound filesystem, &lt;BR /&gt;look at ParallelStructuredStorage.pas inside the zip file. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;After that i am opening it with: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;fstream1:=TFileStream.create(name, fmOpenReadWrite); &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i am reading chunks of streams and 'distributing' them &lt;BR /&gt;to my Thread Pool Engine to be compressed - in parallel - &lt;BR /&gt;by myobj.BZipcompress method, look at: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;for i:=0 to e &lt;BR /&gt;do &lt;BR /&gt; begin &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; if (i=e) and (r=0) then break; &lt;BR /&gt; stream1:=TMemoryStream.create; &lt;BR /&gt; if (r &amp;gt; 0) and (i=e) &lt;BR /&gt; then stream1.copyfrom(fstream1,r) &lt;BR /&gt; else stream1.copyfrom(fstream1,d); &lt;BR /&gt; stream1.position:=0; &lt;BR /&gt; obj:=TJob.create; &lt;BR /&gt; obj.stream:=stream1; &lt;BR /&gt; obj.directory:=directory; &lt;BR /&gt; obj.compressionlevel:=9; &lt;BR /&gt; obj.streamindex:=inttostr(i); &lt;BR /&gt; obj.r:=r; &lt;BR /&gt; obj.number:=e; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; TP.execute(myobj.BZipcompress,pointer(obj)); &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; end; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am doing the same thing in PZlib.pas... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And after that i am reading those compressed files &lt;BR /&gt;from the compound filesystem - look inside pzlib.pas - &lt;BR /&gt;and i am 'distributing' those compressed files, as streams, &lt;BR /&gt;to my Thread Pool Engine to be decompressed - look inside &lt;BR /&gt;pzlib.pas - by myobj.Zlibdecompress method, look at: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;------------------------------------&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;names:=TStringlIST.create; &lt;BR /&gt;storage.foldernames('/',names); &lt;BR /&gt;len:=strtoint(names[0]); &lt;BR /&gt;&lt;BR /&gt;if r=0 then len:=len+ 1 ; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;for i:=0 to len &lt;/P&gt;&lt;P&gt;do &lt;BR /&gt;begin &lt;BR /&gt; if (i=len) and (r=0) then break; &lt;BR /&gt; obj:=TJob.create; &lt;BR /&gt; obj.directory:=directory; &lt;BR /&gt; obj.streamindex:=inttostr(i); &lt;BR /&gt; obj.index:=i; &lt;BR /&gt; obj.number:=e; &lt;BR /&gt; obj.r:=r; &lt;BR /&gt; TP.execute(myobj.Zlibdecompress,pointer(obj)); &lt;BR /&gt;end; &lt;/P&gt;&lt;P&gt;-------------------------------------------------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote: &lt;BR /&gt;&amp;gt; And as you have noticed i am using a portable &lt;BR /&gt;&amp;gt; compound filesystem, look at ParallelStructuredStorage.pas &lt;BR /&gt;&amp;gt; inside the zip file. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why ? &lt;/P&gt;&lt;P&gt;Cause you can parallel compress your files and store &lt;BR /&gt;those compound filesystem .zlb (zlib) or .bz (bzip) &lt;BR /&gt;compressed files in a portable compound filesystem &lt;BR /&gt;and after that you can distribute your compound filesystem... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course you can uncompress files - or all the &lt;BR /&gt;content of your compound file system - from your compound &lt;BR /&gt;file system. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course that's easy with Parallel Compression 1.0 :) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuvk wrote: &lt;BR /&gt;&amp;gt;[...] an algorithm really ;) &lt;BR /&gt;&amp;gt;What's so special about it ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Parallel bzip and zlib is not just pbzip.pas and pzlib.pas &lt;BR /&gt;the parallel bzip and zlib algorithm includes my Thread Pool Engine &lt;BR /&gt;algorithm + Parallel Queue algorithm ... &lt;/P&gt;&lt;P&gt;I am calling it algorithm cause it uses a finite number of &lt;BR /&gt;instructions and rules to resolve a problem - parallel compression &lt;BR /&gt;and decompression - &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you understand ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i said you can parallel compress your files and store &lt;BR /&gt;those compound filesystem .zlb (zlib) or .bz (bzip) &lt;BR /&gt;compressed files in a portable compound filesystem &lt;BR /&gt;and after that you can distribute your compound filesystem... &lt;BR /&gt;And of course you can uncompress files - or all the &lt;BR /&gt;content of your compound file system - from your compound &lt;BR /&gt;file system. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Skybuck wrote&lt;BR /&gt;&amp;gt; I see a whole bunch of pascal/delphi files thrown together, &lt;BR /&gt;&amp;gt;a whole bunch of dll's and god-forbid ms*.dll files... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Those dlls are mandatory for now... &lt;/P&gt;&lt;P&gt;and you can easily write a batch file etc. and reorganize ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt; I see some "test programs" which are described as "modules" which they &lt;BR /&gt;&amp;gt; simply are not... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's VERY easy to convert those pzlib.pas and pbzip.pas &lt;BR /&gt;to units, and that's what i will do in the next step... &lt;/P&gt;&lt;P&gt;Parallel Compression 1.0 will still be enhanced in the future... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt; It shouldn't be that hard... set your editor to "use tab character" (turn &lt;BR /&gt;&amp;gt; tabs to spaces off) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am not using the delphi editor, just the notpad.exe or write.exe... &lt;BR /&gt;and i am compiling from the dos prompt... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt;So far it seems like you are inserting your &lt;BR /&gt;&amp;gt;threads/syncronizations &lt;BR /&gt;&amp;gt;everywhere in single-thread-design algorithms ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;No, it's not just insertting threads/syncronizations .. &lt;/P&gt;&lt;P&gt;I have reasoned - and used logic - look for example at &lt;BR /&gt;parallelhashlist.pas inside the zip file, i am using MEWs etc. &lt;BR /&gt;carefully in the right places and i have also a little bit &lt;BR /&gt;modified the serial code... and it uses a hash based method , &lt;BR /&gt;with an array of MREW... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The Thread Pool Engine Engine i have constructued it from zero &lt;BR /&gt;- and i have used my ParallelQueue - an efficent lock-free queue - &lt;BR /&gt;etc.... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The parallel bzip and zlib, i have constructed it by using &lt;BR /&gt;also my Thread Pool Engine construction etc... &lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's not just 'inserting' threads/syncronizations. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote:&lt;BR /&gt;&amp;gt;But my estimate would be that for now on low core systems... the &lt;BR /&gt;&amp;gt;"compression" would take far more time... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;No. pbzlib.pas gave for example 3.3x on 4 cores... &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/ParallelCompression/parallelbzip.htm"&gt;http://pages.videotron.com/aminer/ParallelCompression/parallelbzip.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Skybuck wrote:&lt;BR /&gt;&amp;gt; [...] or anything extraordinary... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Don't be stupid Skybuck. &lt;/P&gt;&lt;P&gt;It's in fact: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Useful &lt;BR /&gt;2 - A good thing for educational purpose. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote: &lt;BR /&gt;&amp;gt;The thread pool concept is retarded. &lt;BR /&gt;&amp;gt;Any good delphi programmer is capable of creating an array of threads. &lt;BR /&gt;&amp;gt;So my advice to you: &lt;BR /&gt;&amp;gt;1. Delete your thread pool, because it's junk. &lt;BR /&gt;&amp;gt;2. Write a serious/big application that uses many threads, &lt;BR /&gt;&amp;gt;and simply derive from TThread to see how easy it is. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;How can you be so stupid ? &lt;/P&gt;&lt;P&gt;My Thread Pool Engine is not just an array of threads, &lt;BR /&gt;it uses effient lock-free queues - example lock-free ParalleQueue - &lt;BR /&gt;for each worker thread and it uses work-stealing - for more &lt;BR /&gt;efficiency - etc ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it easy the work for you - you can 'reuse' the TThreadPool &lt;BR /&gt;Class... - and it is very useful... &lt;/P&gt;&lt;P&gt;Please read again: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/threadpool.htm"&gt;http://pages.videotron.com/aminer/threadpool.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote in alt.comp.lang.borland-delphi: &lt;BR /&gt;&amp;gt; My Thread Pool Engine is not just an array of threads, &lt;BR /&gt;&amp;gt; " &lt;BR /&gt;&amp;gt;&amp;gt; To me it is. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You really don't know what you are talking about.. &lt;/P&gt;&lt;P&gt;The principal threat to scalability in concurrent applications &lt;BR /&gt;is the exclusive resource lock. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;or &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;With low , moderate AND high contention, my ParallelQueue &lt;BR /&gt;offer better scalability - and i am using it inside my &lt;BR /&gt;Thread Pool Engine - . &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Because my ParallelQueue is using an hash based method &lt;BR /&gt;- and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;the frequency with which locks are requested, hence i am &lt;BR /&gt;REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as I stated before , and this is a law or theorem to apply: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's why my ParallelQueue have scored 7 millions of pop() &lt;BR /&gt;transactions per second... better than flqueue and RingBuffer &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;look at: &lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;Http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also my Threadpool uses efficent lock-free queues - &lt;BR /&gt;example lock-free ParallelQueue - for each worker thread &lt;BR /&gt;- to reduce an minimize the contention - and it uses work-stealing &lt;BR /&gt;so my Thread Pool Engine is very efficient... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it easy the work for you - you can 'reuse' the TThreadPool &lt;BR /&gt;Class...- and it is very useful... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, don't be stupid skybuck... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote: &lt;BR /&gt;&amp;gt; Because my ParallelQueue is using an hash based method &lt;BR /&gt;&amp;gt; - and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;&amp;gt; i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;&amp;gt; the frequency with which locks are requested, hence i am &lt;BR /&gt;&amp;gt; REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;With low , moderate AND high contention, my ParallelQueue &lt;BR /&gt;offers better scalability - and i am using it inside my &lt;BR /&gt;Thread Pool Engine - . &lt;/P&gt;&lt;P&gt;And as you have noticed, i am using a low to medium contention &lt;BR /&gt;on the following test: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But i predict that on HIGH contention the push() and pop() will &lt;BR /&gt;score even better than that.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Because my ParallelQueue is using an hash based method &lt;BR /&gt;- and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;the frequency with which locks are requested, hence i am &lt;BR /&gt;REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as I stated before , and this is a law or theorem to apply: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;------------------------&lt;/P&gt;&lt;P&gt;Hello again, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And , as you have noticed , i have followed this theorem [3] when &lt;BR /&gt;i have constructed my Thread Pool Engine etc... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now there is another theorem that i can state like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[4] You have latency and bandwith , so, IF you use efficiently &lt;BR /&gt; one or both of them - latency and bandwidth - your algorithm &lt;BR /&gt; will be more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is why you have to not start too many threads in my &lt;BR /&gt;Thread Pool Engine, so that you will not context switch a lot, &lt;BR /&gt;cause, when you context switch a lot, the latency will grow and &lt;BR /&gt;this is not good for efficiency .. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns , like theorem [1] , &lt;BR /&gt;[2], [3],[4] ... - THEN your will construct a model that will be &lt;BR /&gt;much more CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Take care... &lt;/P&gt;&lt;P&gt;-----------------------------&lt;/P&gt;&lt;P&gt;Hello again, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sorry for my english , but i will continu to explain - my ideas etc. &lt;BR /&gt;- &lt;BR /&gt;using logic and reasonning... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you already know, we have those two notions: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;'Time' - we have time cause there is movement of matter - &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;'Space' &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have those two notions that we call 'Correctness' and &lt;BR /&gt;'Efficiency' &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And . as you have noticed, i have stated the following theorems... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt;it will be much more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2] IF two or more processes or threads use the same critical &lt;BR /&gt;sections THEN they - the processes or threads - must take &lt;BR /&gt;them in the same order to avoid deadlock - in the system - . &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3] If there is LESS contention THEN the algorithm will &lt;BR /&gt;scale better. Due to the fact that S (the serial part) &lt;BR /&gt;become smaller with less contention , and as N become bigger, &lt;BR /&gt;the result - the speed of the program/algorithm... - of the &lt;BR /&gt;Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[4] You have latency and bandwidth , so, IF you use efficiently &lt;BR /&gt;one or both of them - latency and bandwidth - THEN your algorithm &lt;BR /&gt;will be more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why am i calling them theorems ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can also call them rules or true propositions, laws ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now i can 'classify' theorem [2] in the set that i call 'correctness', &lt;BR /&gt;and it states something on correctness.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And theorems [1] [3] [4] in the set that i call 'efficiency'. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;, and they states something on efficiency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But you have to be smart now.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you have noticed, theorem [2] and [3] are in fact &lt;BR /&gt;the same as theorem [4] &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But why am i calling them theorems ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can call them rules,laws... if you want. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] , [2] , [3], [4]... - THEN your will construct a model that will &lt;BR /&gt;be much more &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is one of my preferred methodology in programming. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely, &lt;BR /&gt;Amine Moulay Ramdane&lt;/P&gt;&lt;P&gt;-----------------------&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am still thinking and using logic... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I can add the following rules also: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[5] IF you are using a critical section or spinlock and there is &lt;BR /&gt;a high contention- with many threads - on them THEN there is a &lt;BR /&gt;possibility of a Lock convoy. Due to the fact that the thread &lt;BR /&gt;entering the spinlock or critical section may context switch &lt;BR /&gt;and this will add to the service time - and to the S (serial part) &lt;BR /&gt;of the Amdahl's equation - and this will higher the contention and &lt;BR /&gt;create a possibility of a Lock convoy and to a bad scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We can elevate the problem in [5] by using a Mutex or a Semaphore &lt;BR /&gt;around the crital section or the spinlock... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Another rule now.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[6] If there is contention on a lock - a critical section ... - &lt;BR /&gt;and inside the locked sections you are the I/O - example &lt;BR /&gt;logging a message to a file - this will lead the calling thread &lt;BR /&gt;to block on the I/O and the operating system will deschedule &lt;BR /&gt;the blocked thread until the I/O completes, thus this situation &lt;BR /&gt;will lead to more context switching, and therefore to an increased &lt;BR /&gt;service time , and longer service times, in this case, means &lt;BR /&gt;more lock contention, and more lock contention means a bad &lt;BR /&gt;scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;there is also false sharing etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will construct &lt;BR /&gt;a model that will be much more CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it is one of my preferred methodology in programming. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I will try to add more of those rules , theorems , laws...&lt;BR /&gt;next time... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely, &lt;BR /&gt;Amine Moulay Ramdane. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 04 Apr 2010 03:23:25 GMT</pubDate>
    <dc:creator>aminer10</dc:creator>
    <dc:date>2010-04-04T03:23:25Z</dc:date>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854559#M2020</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned my previous posts, and here is my new post &lt;BR /&gt;that includes my 'ideas' etc...&lt;/P&gt;&lt;P&gt;Srinivas Nayak wrote in comp.programming.threads: &lt;BR /&gt;&amp;gt;Dear All, &lt;BR /&gt;&amp;gt;Please suggest a good book that teaches in great details about the &lt;BR /&gt;&amp;gt;theories behind the followings. &lt;BR /&gt;&amp;gt;1. shared memory concurrent systems. &lt;BR /&gt;&amp;gt;2. message passing concurrent systems. &lt;BR /&gt;&amp;gt;3. mutual exclusion. &lt;BR /&gt;&amp;gt;4. synchronization. &lt;BR /&gt;&amp;gt;5. safety property. &lt;BR /&gt;&amp;gt;6. liveness property. &lt;BR /&gt;&amp;gt;7. fairness property. &lt;BR /&gt;&amp;gt;8. systems with code interleaving (virtual concurrency). &lt;BR /&gt;&amp;gt;9. systems with no code interleaving (true concurrency). &lt;BR /&gt;&amp;gt;10. atomic operations. &lt;BR /&gt;&amp;gt;11. critical sections. &lt;BR /&gt;&amp;gt;12. how to code a concurrent system (about programming language &lt;BR /&gt;&amp;gt;constructs available for it). &lt;BR /&gt;&amp;gt;13. how to mathematically proof the properties. &lt;BR /&gt;&amp;gt;14. how to mechanically verify the properties. &lt;BR /&gt;&amp;gt;15. blocking synchronization. &lt;BR /&gt;&amp;gt;16. non-blocking synchronization. &lt;BR /&gt;&amp;gt;17. lock-freedom. &lt;BR /&gt;&amp;gt;18. wait-freedom. &lt;BR /&gt;&amp;gt;19. deadlock-freedom. &lt;BR /&gt;&amp;gt;20. starvation-freedom. &lt;BR /&gt;&amp;gt;21. livelock-freedom. &lt;BR /&gt;&amp;gt;22. obstruction-freedom. &lt;BR /&gt;&amp;gt;Not only the concepts but also that teaches with very simple &lt;BR /&gt;&amp;gt;mathematical treatment; axiomatic or linear temporal logic. &lt;BR /&gt;&amp;gt;Many of the books I came across are either emphasize one or two topic &lt;BR /&gt;&amp;gt;or just provides a conceptual treatment, without mentioning how to &lt;BR /&gt;&amp;gt;code a concurrent system, check if it is mathematically or manually &lt;BR /&gt;&amp;gt;correct. &lt;BR /&gt;&amp;gt;Please suggest any book or paper where these topics are &lt;BR /&gt;&amp;gt;comprehensively covered in great details. Better if all these are &lt;BR /&gt;&amp;gt;under a single cover that will be easy to understand under the roof of &lt;BR /&gt;&amp;gt;a unifying theory. &lt;BR /&gt;&amp;gt;Survey papers of these are also welcome. &lt;BR /&gt;&amp;gt;With regards, &lt;BR /&gt;&amp;gt;Srinivas Nayak &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;For boundedness and deadlocks... - one of the most &lt;BR /&gt;important properties .. you can use petri nets and reason &lt;BR /&gt;about place invariants equations that you extract from the &lt;BR /&gt;resolution of the following equation: &lt;/P&gt;&lt;P&gt;Transpose(vector) * Incidence matrix = 0 &lt;/P&gt;&lt;P&gt;and find your vectors...on wich you wil base your reasonning... &lt;/P&gt;&lt;P&gt;you can do the same - place invariants equations... - and &lt;BR /&gt;reason about lock and lock-free algorithms... &lt;/P&gt;&lt;P&gt;And you can use also graph reduction techniques... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example , suppose that you resolve your equation &lt;BR /&gt;Transpose(vector) * Incidence matrix = 0 and find the &lt;BR /&gt;following equations &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Note: P,Q,S,R are all the places in the system... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;equations1: 2 * M(P) M(Q) + M(S) = C1 (constant) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;equestion2: M(P) + M + M(S) = C2 (constant) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Note also that vector f * M0 (initial marking) = 0 &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, it follows - from the equations - that since &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;M(P) + M + M(S) = C1 , it means that &lt;BR /&gt;M(P) &amp;lt;= C1 and M &amp;lt;= C1 and M(S) &amp;lt;= C1 &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and, from the second invariant equation , we have &lt;BR /&gt;that M(Q) &amp;lt;= C2 , this IMPLY that the system is &lt;BR /&gt;structuraly bounded. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's the same thing for deadlocks , you reason &lt;BR /&gt;about invariants equations to proove that there is &lt;BR /&gt;no deadlock in the system... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now, if you follow good patterns , that's also good... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And what's a good pattern ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's like a THEOREM that we apply in programming... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Suppose that or IF - we have two threads that want to aquire &lt;BR /&gt;crititical sections, IF the first thread try to aquire critical &lt;BR /&gt;section A and &lt;BR /&gt;after that critical section B, and the second threads try to &lt;BR /&gt;aquire B and A THEN you can have a deadlock in this system. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;you see ? it look like this: IF predicates are meet THEN &lt;BR /&gt;somethings ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now suppose there is many criticals sections... and the first &lt;BR /&gt;thread try to aquire A ,B ,C ,D,E,F,G and second thread try to &lt;BR /&gt;aquire A,G,C,D,E,F,B that's also a problem ... you can &lt;BR /&gt;easily notice it by APPLYING the theorems that we call &lt;BR /&gt;'good patterns'. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You see why good patterns - that looks like theorems - &lt;BR /&gt;are also very powerfull ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's what we call a good pattern - it's like a theorem , &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and it looks like this: IF predicates are meet THEN somethings ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;There is also good patterns - like theorems - to follow for false &lt;BR /&gt;sharing etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you understand why I and others follow also good patterns &lt;BR /&gt;- that look like theorems - ? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MC also wrote: &lt;BR /&gt;&amp;gt; Dear all, &lt;BR /&gt;&amp;gt; Following on the post of Srinu. I am very beginner in multithreaded &lt;BR /&gt;&amp;gt; programming. I have been looking for a good book to read about the &lt;BR /&gt;&amp;gt; basic concepts of mutithreading, I recently bought Programming with &lt;BR /&gt;&amp;gt; POSIX threads- by Butenhof. I didnt quite like that book, what I am &lt;BR /&gt;&amp;gt; looking for a is a book which explains multithreaded programming &lt;BR /&gt;&amp;gt; conceptually and also gives good concrete examples. Can anybody please &lt;BR /&gt;&amp;gt; suggest me a book. &lt;/P&gt;&lt;P&gt;&amp;gt; Thanks, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will just give an advice... &lt;/P&gt;&lt;P&gt;To learn more about parallel programming, just read the old posts &lt;BR /&gt;in comp.programming.threads and the other forums that discuss &lt;BR /&gt;parallel programming.. read them carefully - as i did myself - &lt;BR /&gt;and try to use LOGIC and REASON about them and try to EXTRACT &lt;BR /&gt;the good patterns about parallel programming from them and understand &lt;BR /&gt;them... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also, try to look at the parallel codes - example &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; and other parallel toolkits ...- &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/threadpool.htm"&gt;http://pages.videotron.com/aminer/threadpool.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelhashlist/queue.htm"&gt;http://pages.videotron.com/aminer/parallelhashlist/queue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;and read inside my parallel code:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and the parallel code of others... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and try to 'EXTRACT' and 'UNDERTAND' those good patterns &lt;BR /&gt;to follow... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Good patterns about parallel programming are like theorems: &lt;BR /&gt;IF predicates are meet THEN something... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As an example, take the following page: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://blogs.msdn.com/visualizeparallel/"&gt;http://blogs.msdn.com/visualizeparallel/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As i said before, good patterns about parallel programming &lt;BR /&gt;are like theorems: IF predicates are meet THEN something... &lt;BR /&gt;So, read this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"It is critical to be able to spot data parallelism when you see it &lt;BR /&gt;because data parallel algorithms allow the developer to more easily &lt;BR /&gt;construct efficient and safe code. As opposed to the more complex &lt;BR /&gt;solutions employed against task parallelism, data parallelism allows &lt;BR /&gt;the programmer to perform the same operation on each piece of data &lt;BR /&gt;concurrently without concern for race conditions and consequently, &lt;BR /&gt;the need for synchronization, which results in significant &lt;BR /&gt;performance overhead. Arguably, data parallel algorithms perform &lt;BR /&gt;better (due to the lack of synchronization) and are easier for the &lt;BR /&gt;developer to implement." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, tell me MC, what can you EXTRACT from this ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can extract something like a theorem to follow, like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt; it will be much more effcient - it will perform better- due to &lt;BR /&gt; the lack of sychronization... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Hence, if you follow theorem [1]: it will be a good pattern in &lt;BR /&gt;parallel programming - to follow -. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you undersand now ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart and start to extract those theorems &lt;BR /&gt;- good patterns to follow... - from all the programming codes, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, this theorem that i have extracted from the page is important, &lt;BR /&gt;and it's a good pattern to follow... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;How can this theorem be understood by using mathematical equations ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Easy... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If your algorithm exhibit more data parallelism THEN the proportion &lt;BR /&gt;S - in percentage - will be smaller in the Amdahl equation: &lt;BR /&gt;1 / (S + (P/N)) - N: is the number of cores/processors - hence , the &lt;BR /&gt;algorithm will scale better... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as you have noticed , this is what have stated theorem [1]: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;" [1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt; it will be much more effcient - it will perform better- due to &lt;BR /&gt; the lack of sychronization..." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's the same for the other theorems: on deadlock, false sharing &lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart and start to extract those theorems &lt;BR /&gt;- good patterns to follow... - from all the programming codes, &lt;BR /&gt;articles and forums etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck also wrote: &lt;BR /&gt;&amp;gt; What if people wanna roll there own versions ? ;) &lt;BR /&gt;&amp;gt; They would much better be "served" by algorithms/pseudo &lt;BR /&gt;&amp;gt; code than real code which could be system/language specific ;) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's easy to EXTRACT algorithms from Object Pascal code... &lt;/P&gt;&lt;P&gt;Look for example inside pbzip.pas, i am using this in the &lt;BR /&gt;main body of my program: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;name:='msvcr100.dll'; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's the 'test' file that i am using - it's inside the &lt;BR /&gt;zip file also - once you compile and execute pbzip.pas it &lt;BR /&gt;will generate a file msvcr100.dll.bz. And as you have &lt;BR /&gt;noticed i am using a - portable - compound filesystem, &lt;BR /&gt;look at ParallelStructuredStorage.pas inside the zip file. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;After that i am opening it with: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;fstream1:=TFileStream.create(name, fmOpenReadWrite); &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i am reading chunks of streams and 'distributing' them &lt;BR /&gt;to my Thread Pool Engine to be compressed - in parallel - &lt;BR /&gt;by myobj.BZipcompress method, look at: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;for i:=0 to e &lt;BR /&gt;do &lt;BR /&gt; begin &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; if (i=e) and (r=0) then break; &lt;BR /&gt; stream1:=TMemoryStream.create; &lt;BR /&gt; if (r &amp;gt; 0) and (i=e) &lt;BR /&gt; then stream1.copyfrom(fstream1,r) &lt;BR /&gt; else stream1.copyfrom(fstream1,d); &lt;BR /&gt; stream1.position:=0; &lt;BR /&gt; obj:=TJob.create; &lt;BR /&gt; obj.stream:=stream1; &lt;BR /&gt; obj.directory:=directory; &lt;BR /&gt; obj.compressionlevel:=9; &lt;BR /&gt; obj.streamindex:=inttostr(i); &lt;BR /&gt; obj.r:=r; &lt;BR /&gt; obj.number:=e; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; TP.execute(myobj.BZipcompress,pointer(obj)); &lt;/P&gt;&lt;P&gt;&lt;BR /&gt; end; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am doing the same thing in PZlib.pas... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And after that i am reading those compressed files &lt;BR /&gt;from the compound filesystem - look inside pzlib.pas - &lt;BR /&gt;and i am 'distributing' those compressed files, as streams, &lt;BR /&gt;to my Thread Pool Engine to be decompressed - look inside &lt;BR /&gt;pzlib.pas - by myobj.Zlibdecompress method, look at: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;------------------------------------&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;names:=TStringlIST.create; &lt;BR /&gt;storage.foldernames('/',names); &lt;BR /&gt;len:=strtoint(names[0]); &lt;BR /&gt;&lt;BR /&gt;if r=0 then len:=len+ 1 ; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;for i:=0 to len &lt;/P&gt;&lt;P&gt;do &lt;BR /&gt;begin &lt;BR /&gt; if (i=len) and (r=0) then break; &lt;BR /&gt; obj:=TJob.create; &lt;BR /&gt; obj.directory:=directory; &lt;BR /&gt; obj.streamindex:=inttostr(i); &lt;BR /&gt; obj.index:=i; &lt;BR /&gt; obj.number:=e; &lt;BR /&gt; obj.r:=r; &lt;BR /&gt; TP.execute(myobj.Zlibdecompress,pointer(obj)); &lt;BR /&gt;end; &lt;/P&gt;&lt;P&gt;-------------------------------------------------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote: &lt;BR /&gt;&amp;gt; And as you have noticed i am using a portable &lt;BR /&gt;&amp;gt; compound filesystem, look at ParallelStructuredStorage.pas &lt;BR /&gt;&amp;gt; inside the zip file. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why ? &lt;/P&gt;&lt;P&gt;Cause you can parallel compress your files and store &lt;BR /&gt;those compound filesystem .zlb (zlib) or .bz (bzip) &lt;BR /&gt;compressed files in a portable compound filesystem &lt;BR /&gt;and after that you can distribute your compound filesystem... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course you can uncompress files - or all the &lt;BR /&gt;content of your compound file system - from your compound &lt;BR /&gt;file system. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course that's easy with Parallel Compression 1.0 :) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuvk wrote: &lt;BR /&gt;&amp;gt;[...] an algorithm really ;) &lt;BR /&gt;&amp;gt;What's so special about it ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Parallel bzip and zlib is not just pbzip.pas and pzlib.pas &lt;BR /&gt;the parallel bzip and zlib algorithm includes my Thread Pool Engine &lt;BR /&gt;algorithm + Parallel Queue algorithm ... &lt;/P&gt;&lt;P&gt;I am calling it algorithm cause it uses a finite number of &lt;BR /&gt;instructions and rules to resolve a problem - parallel compression &lt;BR /&gt;and decompression - &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Do you understand ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i said you can parallel compress your files and store &lt;BR /&gt;those compound filesystem .zlb (zlib) or .bz (bzip) &lt;BR /&gt;compressed files in a portable compound filesystem &lt;BR /&gt;and after that you can distribute your compound filesystem... &lt;BR /&gt;And of course you can uncompress files - or all the &lt;BR /&gt;content of your compound file system - from your compound &lt;BR /&gt;file system. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Skybuck wrote&lt;BR /&gt;&amp;gt; I see a whole bunch of pascal/delphi files thrown together, &lt;BR /&gt;&amp;gt;a whole bunch of dll's and god-forbid ms*.dll files... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Those dlls are mandatory for now... &lt;/P&gt;&lt;P&gt;and you can easily write a batch file etc. and reorganize ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt; I see some "test programs" which are described as "modules" which they &lt;BR /&gt;&amp;gt; simply are not... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's VERY easy to convert those pzlib.pas and pbzip.pas &lt;BR /&gt;to units, and that's what i will do in the next step... &lt;/P&gt;&lt;P&gt;Parallel Compression 1.0 will still be enhanced in the future... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt; It shouldn't be that hard... set your editor to "use tab character" (turn &lt;BR /&gt;&amp;gt; tabs to spaces off) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am not using the delphi editor, just the notpad.exe or write.exe... &lt;BR /&gt;and i am compiling from the dos prompt... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt;So far it seems like you are inserting your &lt;BR /&gt;&amp;gt;threads/syncronizations &lt;BR /&gt;&amp;gt;everywhere in single-thread-design algorithms ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;No, it's not just insertting threads/syncronizations .. &lt;/P&gt;&lt;P&gt;I have reasoned - and used logic - look for example at &lt;BR /&gt;parallelhashlist.pas inside the zip file, i am using MEWs etc. &lt;BR /&gt;carefully in the right places and i have also a little bit &lt;BR /&gt;modified the serial code... and it uses a hash based method , &lt;BR /&gt;with an array of MREW... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The Thread Pool Engine Engine i have constructued it from zero &lt;BR /&gt;- and i have used my ParallelQueue - an efficent lock-free queue - &lt;BR /&gt;etc.... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The parallel bzip and zlib, i have constructed it by using &lt;BR /&gt;also my Thread Pool Engine construction etc... &lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;That's not just 'inserting' threads/syncronizations. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote:&lt;BR /&gt;&amp;gt;But my estimate would be that for now on low core systems... the &lt;BR /&gt;&amp;gt;"compression" would take far more time... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;No. pbzlib.pas gave for example 3.3x on 4 cores... &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/ParallelCompression/parallelbzip.htm"&gt;http://pages.videotron.com/aminer/ParallelCompression/parallelbzip.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Skybuck wrote:&lt;BR /&gt;&amp;gt; [...] or anything extraordinary... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Don't be stupid Skybuck. &lt;/P&gt;&lt;P&gt;It's in fact: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Useful &lt;BR /&gt;2 - A good thing for educational purpose. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote: &lt;BR /&gt;&amp;gt;The thread pool concept is retarded. &lt;BR /&gt;&amp;gt;Any good delphi programmer is capable of creating an array of threads. &lt;BR /&gt;&amp;gt;So my advice to you: &lt;BR /&gt;&amp;gt;1. Delete your thread pool, because it's junk. &lt;BR /&gt;&amp;gt;2. Write a serious/big application that uses many threads, &lt;BR /&gt;&amp;gt;and simply derive from TThread to see how easy it is. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;How can you be so stupid ? &lt;/P&gt;&lt;P&gt;My Thread Pool Engine is not just an array of threads, &lt;BR /&gt;it uses effient lock-free queues - example lock-free ParalleQueue - &lt;BR /&gt;for each worker thread and it uses work-stealing - for more &lt;BR /&gt;efficiency - etc ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it easy the work for you - you can 'reuse' the TThreadPool &lt;BR /&gt;Class... - and it is very useful... &lt;/P&gt;&lt;P&gt;Please read again: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/threadpool.htm"&gt;http://pages.videotron.com/aminer/threadpool.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Skybuck wrote in alt.comp.lang.borland-delphi: &lt;BR /&gt;&amp;gt; My Thread Pool Engine is not just an array of threads, &lt;BR /&gt;&amp;gt; " &lt;BR /&gt;&amp;gt;&amp;gt; To me it is. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You really don't know what you are talking about.. &lt;/P&gt;&lt;P&gt;The principal threat to scalability in concurrent applications &lt;BR /&gt;is the exclusive resource lock. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;or &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;With low , moderate AND high contention, my ParallelQueue &lt;BR /&gt;offer better scalability - and i am using it inside my &lt;BR /&gt;Thread Pool Engine - . &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Because my ParallelQueue is using an hash based method &lt;BR /&gt;- and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;the frequency with which locks are requested, hence i am &lt;BR /&gt;REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as I stated before , and this is a law or theorem to apply: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's why my ParallelQueue have scored 7 millions of pop() &lt;BR /&gt;transactions per second... better than flqueue and RingBuffer &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;look at: &lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;Http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also my Threadpool uses efficent lock-free queues - &lt;BR /&gt;example lock-free ParallelQueue - for each worker thread &lt;BR /&gt;- to reduce an minimize the contention - and it uses work-stealing &lt;BR /&gt;so my Thread Pool Engine is very efficient... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it easy the work for you - you can 'reuse' the TThreadPool &lt;BR /&gt;Class...- and it is very useful... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, don't be stupid skybuck... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote: &lt;BR /&gt;&amp;gt; Because my ParallelQueue is using an hash based method &lt;BR /&gt;&amp;gt; - and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;&amp;gt; i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;&amp;gt; the frequency with which locks are requested, hence i am &lt;BR /&gt;&amp;gt; REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;With low , moderate AND high contention, my ParallelQueue &lt;BR /&gt;offers better scalability - and i am using it inside my &lt;BR /&gt;Thread Pool Engine - . &lt;/P&gt;&lt;P&gt;And as you have noticed, i am using a low to medium contention &lt;BR /&gt;on the following test: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But i predict that on HIGH contention the push() and pop() will &lt;BR /&gt;score even better than that.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Because my ParallelQueue is using an hash based method &lt;BR /&gt;- and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;i am REDUCING the duration for which locks are held AND REDUCING &lt;BR /&gt;the frequency with which locks are requested, hence i am &lt;BR /&gt;REDUCING A LOT the contention, so it's very efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as I stated before , and this is a law or theorem to apply: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;------------------------&lt;/P&gt;&lt;P&gt;Hello again, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And , as you have noticed , i have followed this theorem [3] when &lt;BR /&gt;i have constructed my Thread Pool Engine etc... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now there is another theorem that i can state like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[4] You have latency and bandwith , so, IF you use efficiently &lt;BR /&gt; one or both of them - latency and bandwidth - your algorithm &lt;BR /&gt; will be more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is why you have to not start too many threads in my &lt;BR /&gt;Thread Pool Engine, so that you will not context switch a lot, &lt;BR /&gt;cause, when you context switch a lot, the latency will grow and &lt;BR /&gt;this is not good for efficiency .. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You have to be smart. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns , like theorem [1] , &lt;BR /&gt;[2], [3],[4] ... - THEN your will construct a model that will be &lt;BR /&gt;much more CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Take care... &lt;/P&gt;&lt;P&gt;-----------------------------&lt;/P&gt;&lt;P&gt;Hello again, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sorry for my english , but i will continu to explain - my ideas etc. &lt;BR /&gt;- &lt;BR /&gt;using logic and reasonning... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you already know, we have those two notions: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;'Time' - we have time cause there is movement of matter - &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;'Space' &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have those two notions that we call 'Correctness' and &lt;BR /&gt;'Efficiency' &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And . as you have noticed, i have stated the following theorems... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt;it will be much more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2] IF two or more processes or threads use the same critical &lt;BR /&gt;sections THEN they - the processes or threads - must take &lt;BR /&gt;them in the same order to avoid deadlock - in the system - . &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3] If there is LESS contention THEN the algorithm will &lt;BR /&gt;scale better. Due to the fact that S (the serial part) &lt;BR /&gt;become smaller with less contention , and as N become bigger, &lt;BR /&gt;the result - the speed of the program/algorithm... - of the &lt;BR /&gt;Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[4] You have latency and bandwidth , so, IF you use efficiently &lt;BR /&gt;one or both of them - latency and bandwidth - THEN your algorithm &lt;BR /&gt;will be more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Why am i calling them theorems ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can also call them rules or true propositions, laws ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now i can 'classify' theorem [2] in the set that i call 'correctness', &lt;BR /&gt;and it states something on correctness.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And theorems [1] [3] [4] in the set that i call 'efficiency'. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;, and they states something on efficiency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But you have to be smart now.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you have noticed, theorem [2] and [3] are in fact &lt;BR /&gt;the same as theorem [4] &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But why am i calling them theorems ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You can call them rules,laws... if you want. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And as i have stated before: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] , [2] , [3], [4]... - THEN your will construct a model that will &lt;BR /&gt;be much more &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It is one of my preferred methodology in programming. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely, &lt;BR /&gt;Amine Moulay Ramdane&lt;/P&gt;&lt;P&gt;-----------------------&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I am still thinking and using logic... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I can add the following rules also: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[5] IF you are using a critical section or spinlock and there is &lt;BR /&gt;a high contention- with many threads - on them THEN there is a &lt;BR /&gt;possibility of a Lock convoy. Due to the fact that the thread &lt;BR /&gt;entering the spinlock or critical section may context switch &lt;BR /&gt;and this will add to the service time - and to the S (serial part) &lt;BR /&gt;of the Amdahl's equation - and this will higher the contention and &lt;BR /&gt;create a possibility of a Lock convoy and to a bad scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We can elevate the problem in [5] by using a Mutex or a Semaphore &lt;BR /&gt;around the crital section or the spinlock... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Another rule now.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[6] If there is contention on a lock - a critical section ... - &lt;BR /&gt;and inside the locked sections you are the I/O - example &lt;BR /&gt;logging a message to a file - this will lead the calling thread &lt;BR /&gt;to block on the I/O and the operating system will deschedule &lt;BR /&gt;the blocked thread until the I/O completes, thus this situation &lt;BR /&gt;will lead to more context switching, and therefore to an increased &lt;BR /&gt;service time , and longer service times, in this case, means &lt;BR /&gt;more lock contention, and more lock contention means a bad &lt;BR /&gt;scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;there is also false sharing etc. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will construct &lt;BR /&gt;a model that will be much more CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it is one of my preferred methodology in programming. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I will try to add more of those rules , theorems , laws...&lt;BR /&gt;next time... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely, &lt;BR /&gt;Amine Moulay Ramdane. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Apr 2010 03:23:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854559#M2020</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T03:23:25Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854560#M2021</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Now if you have noticed i am using 'logic'&lt;BR /&gt;and it is logic that invented mathematics..&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;As an example, in logic we have the following &lt;BR /&gt;law and tautologie:&lt;BR /&gt;&lt;BR /&gt;((p -&amp;gt; q) and (not(p) -&amp;gt; q )) is equivalent to q&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Now, like in logic, i have followed the sameproof by deduction, &lt;BR /&gt;and as an example i said:&lt;BR /&gt;&lt;BR /&gt;"Because my Parallel queue is using a hash based method &lt;BR /&gt;- and lock striping - and using just a LockedInc() , so, &lt;BR /&gt;i am REDUCING the duration for which locks are held AND &lt;BR /&gt;REDUCING the frequency with which locks are requested, &lt;BR /&gt;hence i am REDUCING A LOT the contention, so it's very efficient. &lt;P&gt;&lt;BR /&gt;And as I stated before , and this is a law or theorem to apply: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;It's why my ParallelQueue have scored 7 millions of pop() &lt;BR /&gt;transactions per second... better than flqueue and RingBuffer "&lt;/P&gt;&lt;BR /&gt;&lt;BR /&gt;So, as you have noticed i am using the Amdhal's law to &lt;BR /&gt;prove theorem [3], that's the same in a proof by deduction...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will construct &lt;BR /&gt;a model that will be much more CORRECT and EFFICIENT. &lt;P&gt;&lt;BR /&gt;And it is one of my preferred methodology in programming. &lt;/P&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;Amine Moulay Ramane.&lt;BR /&gt;</description>
      <pubDate>Sun, 04 Apr 2010 04:25:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854560#M2021</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T04:25:50Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854561#M2022</link>
      <description>&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt;As an example, in logic we have the following &lt;BR /&gt;&amp;gt;law and tautologie:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I mean tautology.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.&lt;BR /&gt;</description>
      <pubDate>Sun, 04 Apr 2010 04:34:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854561#M2022</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T04:34:24Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854562#M2023</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have stated the following theorems and rules:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[1] IF your algorithm exhibit much more data parallelism THEN &lt;BR /&gt; it will be much more efficient. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[2] IF two or more processes or threads use the same critical &lt;BR /&gt; sections THEN they - the processes or threads - must take &lt;BR /&gt; them in the same order to avoid deadlock - in the system - . &lt;/P&gt;&lt;P&gt;[3] If there is LESS contention THEN the algorithm will &lt;BR /&gt; scale better. Due to the fact that S (the serial part) &lt;BR /&gt; become smaller with less contention , and as N become bigger, &lt;BR /&gt; the result - the speed of the program/algorithm... - of the &lt;BR /&gt; Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;[4] You have latency and bandwidth , so, IF you use efficiently &lt;BR /&gt; one or both of them - latency and bandwidth - THEN your algorithm &lt;BR /&gt; will be more efficient. &lt;/P&gt;&lt;P&gt;[5] IF you are using a critical section or spinlock and there is &lt;BR /&gt;a high contention- with many threads - on them THEN there is a &lt;BR /&gt;possibility of a Lock convoy. Due to the fact that the thread &lt;BR /&gt;entering the spinlock or critical section may context switch &lt;BR /&gt;and this will add to the service time - and to the S (serial part) &lt;BR /&gt;of the Amdahl's equation - and this will higher the contention and &lt;BR /&gt;create a possibility of a Lock convoy and to a bad scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We can elevate the problem in [5] by using a Mutex or a Semaphore &lt;BR /&gt;around the crital section or the spinlock... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[6] If there is contention on a lock - a critical section ... - &lt;BR /&gt; and inside the locked sections you are the I/O - example &lt;BR /&gt; logging a message to a file - this will lead the calling thread &lt;BR /&gt; to block on the I/O and the operating system will deschedule &lt;BR /&gt; the blocked thread until the I/O completes, thus this situation &lt;BR /&gt; will lead to more context switching, and therefore to an increased &lt;BR /&gt; service time , and longer service times, in this case, means &lt;BR /&gt; more lock contention, and more lock contention means a bad &lt;BR /&gt; scalability. &lt;/P&gt;&lt;P&gt;etc.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So , and as in logic , you can reason by deduction like this:&lt;/P&gt;&lt;P&gt;If [1] AND [3] THEN your algorithm is much more efficient. &lt;/P&gt;&lt;P&gt;If [2] AND [5] THEN you have a deadlock &lt;BR /&gt; and &lt;BR /&gt; the possibility of Lock-convoy&lt;BR /&gt; and bad scalability.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If [6] THEN you have a bad scalability problem.&lt;/P&gt;&lt;P&gt;etc.&lt;/P&gt;&lt;P&gt;IF you follow and base your reasonning on those theorems &lt;BR /&gt;- or laws or true propositions or good patterns - like rules or &lt;BR /&gt;theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will &lt;BR /&gt;construct &lt;BR /&gt;a model that will be much more CORRECT and EFFICIENT. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it is one of my preferred methodology in programming. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I will try to add more of those rules , theorems , laws... &lt;BR /&gt;next time... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Apr 2010 05:18:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854562#M2023</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T05:18:53Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854563#M2024</link>
      <description>&lt;BR /&gt;I write:&lt;BR /&gt;&amp;gt;[6] If there is contention on a lock - a critical section ... - &lt;BR /&gt;&amp;gt; and inside the locked sections you areusing theI/O - example &lt;BR /&gt;&amp;gt; logging a message to a file - this will lead the calling thread &lt;BR /&gt;&amp;gt; to block on the I/O and the operating system will deschedule &lt;BR /&gt;&amp;gt; the blocked thread until the I/O completes, thus this situation &lt;BR /&gt;&amp;gt; will lead to more context switching, and therefore to an increased &lt;BR /&gt;&amp;gt; service time , and longer service times, in this case, means &lt;BR /&gt;&amp;gt; more lock contention, and more lock contention means a bad &lt;BR /&gt;&amp;gt; scalability.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;You can elevate problem [6] by using for example a lock-free queue&lt;BR /&gt;- lock-free ParallelQueue or... - , with mutiple consumers pushing the&lt;BR /&gt;messages and one worker thread doing the job - loggingthe messages &lt;BR /&gt;to a file - ...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Sincerely;&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 04 Apr 2010 07:41:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854563#M2024</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T07:41:30Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854564#M2025</link>
      <description>&lt;P&gt;I wrote:&lt;BR /&gt;&amp;gt; You can elevate problem [6] by using for example a lock-free queue&lt;BR /&gt;&amp;gt; - lock-free ParallelQueue or... - , with mutiple consumers pushing&lt;/P&gt;&lt;P&gt;I mean multiple 'producers' pushing the messages - to the lock-free queue - &lt;BR /&gt;and one consumer/worker doing the job - logging the messages&lt;BR /&gt;to a file - ...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;On Apr 4, 3:42 am, aminer &amp;lt;&lt;A href="mailto:ami...@videotron.ca"&gt;ami...@videotron.ca&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; I write:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; &amp;gt;[6] If there is contention on a lock - a critical section ... -&lt;BR /&gt;&amp;gt; &amp;gt; and inside the locked sections you are using the I/O - example&lt;BR /&gt;&amp;gt; &amp;gt; logging a message to a file - this will lead the calling thread&lt;BR /&gt;&amp;gt; &amp;gt; to block on the I/O and the operating system will deschedule&lt;BR /&gt;&amp;gt; &amp;gt; the blocked thread until the I/O completes, thus this situation&lt;BR /&gt;&amp;gt; &amp;gt; will lead to more context switching, and therefore to an increased&lt;BR /&gt;&amp;gt; &amp;gt; service time , and longer service times, in this case, means&lt;BR /&gt;&amp;gt; &amp;gt; more lock contention, and more lock contention means a bad&lt;BR /&gt;&amp;gt; &amp;gt; scalability.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; You can elevate problem [6] by using for example a lock-free queue&lt;BR /&gt;&amp;gt; - lock-free ParallelQueue or... - , with mutiple consumers pushing&lt;BR /&gt;&amp;gt; the&lt;BR /&gt;&amp;gt; messages and one worker thread doing the job - logging the messages&lt;BR /&gt;&amp;gt; to a file - ...&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Sincerely;&lt;BR /&gt;&amp;gt; Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Apr 2010 14:27:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854564#M2025</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T14:27:35Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854565#M2026</link>
      <description>&lt;P&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt; If [2] AND [5] THEN you have a deadlock&lt;BR /&gt;&amp;gt; and&lt;BR /&gt;&amp;gt; the possibility of Lock-convoy&lt;BR /&gt;&amp;gt; and bad scalability.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; If [6] THEN you have a bad scalability problem.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If - your threads (or processes) are using &lt;BR /&gt;the same critical sections and - you didn't follow [2] &lt;BR /&gt;AND &lt;BR /&gt;you are using a critical section or spinlock and &lt;BR /&gt;there is a high contention on them [5] &lt;BR /&gt; THEN you have a deadlock&lt;BR /&gt; and&lt;BR /&gt; the possibility of Lock-convoy&lt;BR /&gt; and bad scalability.&lt;BR /&gt;&lt;BR /&gt;If [6] THEN you have a bad scalability problem.&lt;BR /&gt;&lt;BR /&gt;etc.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On Apr 4, 1:17 am, aminer &amp;lt;&lt;A href="mailto:ami...@videotron.ca"&gt;ami...@videotron.ca&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; Hello,&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; I have stated the following theorems and rules:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [1] IF your algorithm exhibit much more data parallelism THEN&lt;BR /&gt;&amp;gt; it will be much more efficient.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [2] IF two or more processes or threads use the same critical&lt;BR /&gt;&amp;gt; sections THEN they - the processes or threads - must take&lt;BR /&gt;&amp;gt; them in the same order to avoid deadlock - in the system - .&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [3] If there is LESS contention THEN the algorithm will&lt;BR /&gt;&amp;gt; scale better. Due to the fact that S (the serial part)&lt;BR /&gt;&amp;gt; become smaller with less contention , and as N become bigger,&lt;BR /&gt;&amp;gt; the result - the speed of the program/algorithm... - of the&lt;BR /&gt;&amp;gt; Amdahl's equation 1/(S+(P/N)) become bigger.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [4] You have latency and bandwidth , so, IF you use efficiently&lt;BR /&gt;&amp;gt; one or both of them - latency and bandwidth - THEN your algorithm&lt;BR /&gt;&amp;gt; will be more efficient.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [5] IF you are using a critical section or spinlock and there is&lt;BR /&gt;&amp;gt; a high contention- with many threads - on them THEN there is a&lt;BR /&gt;&amp;gt; possibility of a Lock convoy. Due to the fact that the thread&lt;BR /&gt;&amp;gt; entering the spinlock or critical section may context switch&lt;BR /&gt;&amp;gt; and this will add to the service time - and to the S (serial part)&lt;BR /&gt;&amp;gt; of the Amdahl's equation - and this will higher the contention and&lt;BR /&gt;&amp;gt; create a possibility of a Lock convoy and to a bad scalability.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; We can elevate the problem in [5] by using a Mutex or a Semaphore&lt;BR /&gt;&amp;gt; around the crital section or the spinlock...&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; [6] If there is contention on a lock - a critical section ... -&lt;BR /&gt;&amp;gt; and inside the locked sections you are the I/O - example&lt;BR /&gt;&amp;gt; logging a message to a file - this will lead the calling thread&lt;BR /&gt;&amp;gt; to block on the I/O and the operating system will deschedule&lt;BR /&gt;&amp;gt; the blocked thread until the I/O completes, thus this situation&lt;BR /&gt;&amp;gt; will lead to more context switching, and therefore to an increased&lt;BR /&gt;&amp;gt; service time , and longer service times, in this case, means&lt;BR /&gt;&amp;gt; more lock contention, and more lock contention means a bad&lt;BR /&gt;&amp;gt; scalability.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; etc.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So , and as in logic , you can reason by deduction like this:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; If [1] AND [3] THEN your algorithm is much more efficient.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; If [2] AND [5] THEN you have a deadlock&lt;BR /&gt;&amp;gt; and&lt;BR /&gt;&amp;gt; the possibility of Lock-convoy&lt;BR /&gt;&amp;gt; and bad scalability.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; If [6] THEN you have a bad scalability problem.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; etc.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; IF you follow and base your reasonning on those theorems&lt;BR /&gt;&amp;gt; - or laws or true propositions or good patterns - like rules or&lt;BR /&gt;&amp;gt; theorems [1] , [2] , [3], [4] , [5], [6]... - THEN your will&lt;BR /&gt;&amp;gt; construct&lt;BR /&gt;&amp;gt; a model that will be much more CORRECT and EFFICIENT.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; And it is one of my preferred methodology in programming.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; I will try to add more of those rules , theorems , laws...&lt;BR /&gt;&amp;gt; next time...&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Sincerely,&lt;BR /&gt;&amp;gt; Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Apr 2010 14:45:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854565#M2026</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T14:45:24Z</dc:date>
    </item>
    <item>
      <title>A good methodology ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854566#M2027</link>
      <description>&lt;P&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt; IF - your threads (or processes) are using&lt;BR /&gt;&amp;gt; the same critical sections and - you didn't follow [2]&lt;BR /&gt;&amp;gt; AND&lt;BR /&gt;&amp;gt; you are using a critical section or spinlock and&lt;BR /&gt;&amp;gt; there is a high contention on them [5]&lt;BR /&gt;&amp;gt; THEN (you have a deadlock&lt;BR /&gt;&amp;gt; and&lt;BR /&gt;&amp;gt; the possibility of Lock-convoy&lt;BR /&gt;&amp;gt; and bad scalability.)&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; If [6] THEN you have a bad scalability problem.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; etc.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And, as in logic, you can reason by deduction - inference - &lt;/P&gt;&lt;P&gt; :)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Apr 2010 15:10:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-good-methodology/m-p/854566#M2027</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-04-04T15:10:52Z</dc:date>
    </item>
  </channel>
</rss>

