<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic About scalability [1]... in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829668#M1501</link>
    <description>I merged two more threads into this one.&lt;BR /&gt;&lt;BR /&gt;And the one below. Sorry, that one is out of sequence because I didn't notice it at first.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;==&lt;BR /&gt;Aubrey W.&lt;BR /&gt;Intel Software Network Support</description>
    <pubDate>Thu, 16 Sep 2010 21:06:33 GMT</pubDate>
    <dc:creator>Aubrey_W_</dc:creator>
    <dc:date>2010-09-16T21:06:33Z</dc:date>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829643#M1476</link>
      <description>&lt;DIV id="inbdy"&gt;&lt;A name="msg_a9b4f3a9bfd2b629"&gt;&lt;/A&gt;&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt;&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;I didn't know where to ask this question, so i decided to ask here.. &lt;/P&gt;&lt;P&gt;I just read yesterday the following page, look at the the USL &lt;BR /&gt;(Universal Law of Computational Scalability) of Dr. Gunther, &lt;BR /&gt;he wrote this: ( see &lt;A href="http://en.wikipedia.org/wiki/Neil_J._Gunther"&gt;http://en.wikipedia.org/wiki/Neil_J._Gunther&lt;/A&gt;)&lt;BR /&gt;&lt;BR /&gt;-------------------------- &lt;/P&gt;&lt;P&gt;The relative capacity C(N) of a computational platform is given by: &lt;/P&gt;&lt;P&gt;C(N) =       N &lt;BR /&gt;         ------------------- &lt;BR /&gt;     1 +  (N - 1) +  N (N - 1) &lt;/P&gt;&lt;P&gt;where N represents either the number of physical processors &lt;BR /&gt;in the hardware configuration or the number of users driving the &lt;BR /&gt;software application. The parameters  and  represent respectively &lt;BR /&gt;the levels of contention (e.g., queueing for shared resources) and &lt;BR /&gt;coherency delay (i.e., latency for data to become consistent) in the &lt;BR /&gt;system. The  parameter also quantifies the retrograde throughput seen &lt;BR /&gt;in many stress tests but not accounted for in either Amdahl's law or &lt;BR /&gt;event-based simulations. &lt;/P&gt;&lt;P&gt;--------------- &lt;/P&gt;&lt;P&gt;His website: &lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://www.perfdynamics.com/&amp;amp;usg=AFQjCNF5RDoikW8IWPWAa1tLCunOOpkp8Q"&gt;http://www.perfdynamics.com/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;If you read carefully , you will see that Dr. Gunther is using this &lt;BR /&gt;model to predict scalability after he simulates a relatively small &lt;BR /&gt;number &lt;BR /&gt;of vusers in LoadRunner ( because of licensing costs, it's cost- &lt;BR /&gt;effectiveness) &lt;BR /&gt;and after that he finds the coefficients of the 2nd-degree polynomial &lt;BR /&gt;(quadratic equation) and then transform those coefficients back to the &lt;BR /&gt;USL parameters using the  = b - a and  = a. &lt;/P&gt;&lt;P&gt;And than he is extrapolating with the USL model to higher loads &lt;BR /&gt;to predict scalability. &lt;/P&gt;&lt;P&gt;He is also applying the model to webservers with heterogeneous &lt;BR /&gt;workloads. like in the following page: &lt;/P&gt;&lt;P&gt;&lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&amp;amp;usg=AFQjCNHBl5e0W05i5t3jhXzpL31ip9mwSQ"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-wi...&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Now my question follows: &lt;/P&gt;&lt;P&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation to &lt;BR /&gt;predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? &lt;/P&gt;&lt;P&gt;Can it give this information ? &lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://pages.videotron.com/aminer/&amp;amp;usg=AFQjCNEc4XUK97itZ-yH6hly_HwgwpbUjg"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 07 Sep 2010 19:37:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829643#M1476</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-07T19:37:41Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829644#M1477</link>
      <description>&lt;P&gt;&lt;BR /&gt;An important answer that i wrote (from comp.programming.threads newsgroup)...&lt;BR /&gt;&lt;BR /&gt;On Sep 6, 4:12 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; As the subject line suggests (I hope), I'm missing something &lt;BR /&gt;&amp;gt;really basic about lock-free algorithms. In particular, why &lt;BR /&gt;&amp;gt;they can't use locks.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability.&lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested&lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention...&lt;/P&gt;&lt;P&gt;When we have lock contention, a factor called mean &lt;BR /&gt;waiting time will be added to the Amdhal equation &lt;BR /&gt;like this: Speed = 1/(S+mean waiting time+(P/N)) &lt;BR /&gt;it's the mean waiting time of the threads that are&lt;BR /&gt;waiting on the lock , and when we reduce the duration &lt;BR /&gt;for wich locks are held this will lower the service time: &lt;BR /&gt;hence the waiting time will be smaller and this &lt;BR /&gt;will higher the speed in the Amdhal equation.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Look at how my Lock-free ParallelQueue is scaling:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you loOK at this page, i wrote this:&lt;/P&gt;&lt;P&gt;[1] If there is LESS contention THEN the algorithm &lt;BR /&gt;will scale better. Due to the fact that S (the serial part) &lt;BR /&gt;become smaller with less contention , and as N becomes bigger,&lt;BR /&gt;the result - the speed of the program/algorithm... - &lt;BR /&gt;of the Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[2] If there is false sharing THEN the algorithm &lt;BR /&gt;will not scale well. Due to the fact that S (the serial part) &lt;BR /&gt;of the Amdahl's equation 1/(S+(P/N)) will becomes bigger &lt;BR /&gt;with false sharing, so, this is not good for scalability.&lt;/P&gt;&lt;P&gt;As you have notice, i am including the mean waiting time &lt;BR /&gt;factor as if it were a serial part, cause in Amdhal equation &lt;BR /&gt;there is only two parts S(serial part) and P(parallel part),&lt;BR /&gt;but the correct equation is: Speed = 1/(S+mean waiting time+(P/N)).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On Sep 6, 4:12 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; As the subject line suggests (I hope), I'm missing something really basic about&lt;BR /&gt;&amp;gt; lock-free algorithms. In particular, why they can't use locks.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Wikipedia and others define lock-free algorithms as those where at least one&lt;BR /&gt;&amp;gt; thread in a process is guaranteed to make progress, regardless of what the other&lt;BR /&gt;&amp;gt; threads are doing. So suppose I bundle an array with a mutex to make a&lt;BR /&gt;&amp;gt; "synchronized array," where each operation on the array grabs the mutex, does&lt;BR /&gt;&amp;gt; some work on the array (but does not call into code outside the array, i.e.,&lt;BR /&gt;&amp;gt; doesn't invoke any code that could itself try to grab the mutex), then releases&lt;BR /&gt;&amp;gt; the mutex. This uses a lock, but inside the "synchronized array" critical&lt;BR /&gt;&amp;gt; section, even if all the other threads in the process are stalled, the thread in&lt;BR /&gt;&amp;gt; the critical section is making progress.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; As I said, I'm missing something fundamental. I'd be grateful if somebody would&lt;BR /&gt;&amp;gt; point out what it is.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Thanks,&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Scott&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; --&lt;BR /&gt;&amp;gt; * C++ and Beyond: Meyers, Sutter, &amp;amp; Alexandrescu, Oct. 24-27 near Seattle&lt;BR /&gt;&amp;gt; (&lt;A href="http://cppandbeyond.com/"&gt;http://cppandbeyond.com/&lt;/A&gt;)&lt;BR /&gt;&amp;gt; * License my training materials for commercial (&lt;A href="http://tinyurl.com/yfzvkp9"&gt;http://tinyurl.com/yfzvkp9&lt;/A&gt;) or&lt;BR /&gt;&amp;gt; personal use (&lt;A href="http://tinyurl.com/yl5ka5p"&gt;http://tinyurl.com/yl5ka5p&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Sep 2010 20:53:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829644#M1477</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T20:53:58Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829645#M1478</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Hello again,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I will add that The N. J. Gunther Universal Scalability Law states&lt;BR /&gt;&lt;BR /&gt;The relative capacity C(N) is a normalized throughput given by: &lt;A&gt;&lt;/A&gt;&lt;BR clear="all" /&gt;&lt;TABLE width="100%" border="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;TABLE align="center" cellpadding="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD align="center" nowrap="nowrap"&gt;C(N) = &lt;/TD&gt;&lt;TD align="center" nowrap="nowrap"&gt;N &lt;DIV class="hrcomp"&gt;&lt;HR noshade="noshade" size="1" /&gt;&lt;/DIV&gt;1 + (N  1) + N (N  1)&lt;BR /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;BR /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;And as you will notice in the following graphs in:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;there is anegative return on investment when  &amp;gt; 0 andthe &lt;BR /&gt;coherency paramater &amp;gt; 0, thiscauses the throughput &lt;BR /&gt;to go retrograde faster in the lock-free RingBuffer(in both the &lt;BR /&gt;push and the pop) and in the lock-free flqueue (in the pop side),&lt;BR /&gt;but in my lock-free ParallelQueue itkeep scaling very well...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:19:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829645#M1478</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:19:21Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829646#M1479</link>
      <description>&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt; &lt;P&gt;&lt;/P&gt;&lt;DIV class="qt" id="qhide_91242"&gt;Scott Meyers wrote: &lt;BR /&gt;&amp;gt; I agree, but my question is about the *definition* &lt;BR /&gt;&amp;gt; of being lock-free and how that definition precludes &lt;BR /&gt;&amp;gt; the use of locks. I already know why lock-free is &lt;BR /&gt;&amp;gt; good for scalability. What I don't know is how the &lt;BR /&gt;&amp;gt; definition of lock-free prohibits using locks. &lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;BR /&gt;I don't understand ? in lock-free algorithms under x86 &lt;BR /&gt;we use for example a lock instruction in assembler, &lt;BR /&gt;and it's in fact a lock that we are using. &lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;It's the lock-free algorithm that is different , &lt;BR /&gt;you retry when the transaction( the lock..) fails, &lt;BR /&gt;like in TM. &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:32:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829646#M1479</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:32:32Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829647#M1480</link>
      <description>&lt;P&gt;On Sep 6, 5:36 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; On 9/6/2010 2:27 PM, aminer wrote:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; &amp;gt; I don't understand ? in lock-free algorithms under x86&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; "Lock-free algorithm" is a technical term. Its definition is independent of the&lt;BR /&gt;&amp;gt; language you are programming in and the hardware you are running on.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Note that the definition of "lock-free" makes no mention of locks at all, so&lt;BR /&gt;&amp;gt; it's not something as simple as "lock-free is defined to use no locks". This,&lt;BR /&gt;&amp;gt; for example, is Wikipedia's definition&lt;BR /&gt;&amp;gt; (&lt;A href="http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom"&gt;http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom&lt;/A&gt;):&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; An algorithm is lock-free if it satisfies that when the program threads are&lt;BR /&gt;&amp;gt; run sufficiently long at least one of the threads make progress (for some&lt;BR /&gt;&amp;gt; sensible definition of progress).&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; How does this definition preclude the use of e.g. a mutex in the algorithm?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You are confusing/mixing mutex with lock, you were speaking &lt;BR /&gt;about lock, and as i said before lock-free algorithms does &lt;BR /&gt;in fact use locks , like in x86 lock-free algorithms are &lt;BR /&gt;using a lock instruction in assembler , in SMR also they &lt;BR /&gt;use locks etc. the Mutex also does use locks in its code internally...&lt;/P&gt;&lt;P&gt;But as i said before , it's the lock-free algorithm &lt;BR /&gt;that is different , you retry when the transaction( the lock..) &lt;BR /&gt;fails, like in TM, and we call that Optimistic locking.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Mouly Ramdane.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:49:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829647#M1480</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:49:05Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829648#M1481</link>
      <description>In"Lock Free" coding you are only permitted to hold an uninterruptable atomic lock. Example:&lt;BR /&gt;&lt;BR /&gt; LOCK; add dword ptr [eax], 1&lt;BR /&gt;&lt;BR /&gt;In the above sequence (when eax is a legitimate, aligned,address in R/W memory)the LOCK prefex is used to permit Read/Modify/Write instruction to execute in an atomic manner. More important to "Lock Free" - the above instruction cannot be interrupted. Therefore, "Lock Free" means: no lock can be pre-empted.&lt;BR /&gt;&lt;BR /&gt;While a "Lock" is a software lock (typically using LOCK prefix to obtain the lock) and once the lock is obtained, interruptable code sequence follows. The problem with { lock, code, unlock} comes when the system pre-empts the code while holding the lock - all threads waiting for the lock wait for the duration of the preemption (plus remaining code time by thread holding lock).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Tue, 07 Sep 2010 12:15:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829648#M1481</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-09-07T12:15:50Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829649#M1482</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello again,&lt;/P&gt;&lt;P&gt;Look at the following page, Dr Gunther is applying &lt;BR /&gt;the USL model to webservers:&lt;/P&gt;&lt;P&gt;&lt;A href="http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;When we are modeling webservers , we have to include &lt;BR /&gt;the network&amp;amp;tcp/ip in our newtwork queuig model &lt;BR /&gt;(that comprises the queue of of the computer server side) , &lt;BR /&gt;and since the service in the computer server is comprised of &lt;BR /&gt;multiple services (when we are using htmls , databases etc.)&lt;BR /&gt;the network&amp;amp;tcp/ip queue will not be markovian in the service &lt;BR /&gt;side, and we have to model the network&amp;amp;tcpip queue as an M/G/1 &lt;BR /&gt;and this will complicate the mathematical analytic modeling...&lt;/P&gt;&lt;P&gt;Now as i said before:&lt;/P&gt;&lt;P&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation &lt;BR /&gt;to predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? I think USL can not predict this.&lt;/P&gt;&lt;P&gt;So, i think the best way is to use a webserver stress tool &lt;BR /&gt;like &lt;A href="http://fwptt.sourceforge.net/"&gt;http://fwptt.sourceforge.net/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can even test the webserver with an heterogeneous &lt;BR /&gt;workloads by starting multiple fwtpp processes, and &lt;BR /&gt;you should increase the number of threads to 5 and after &lt;BR /&gt;that to 10 threads, 15 ... and so on until the webserver&lt;BR /&gt;applications stops responding propperly(and this will inform &lt;BR /&gt;on the maximum number of users that you can have in the same time...) &lt;BR /&gt;and as you are stress testing you can even see (by testing/measuring &lt;BR /&gt;it) if the network bandwidth is not the bottleneck... and this can &lt;BR /&gt;not be done by the USL model.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Sep 2010 21:03:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829649#M1482</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-07T21:03:29Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829650#M1483</link>
      <description>&lt;P&gt;I wrote:&lt;BR /&gt;&amp;gt; When we are modeling webservers , we have to include&lt;BR /&gt;&amp;gt; the network&amp;amp;tcp/ip in our network queuig model&lt;BR /&gt;&amp;gt; (that comprises the queue of the computer server side) ,&lt;BR /&gt;&amp;gt; and since the service in the computer server is comprised of&lt;BR /&gt;&amp;gt; multiple services (when we are using htmls , databases etc.)&lt;BR /&gt;&amp;gt; the network&amp;amp;tcp/ip queue will not be markovian in the service&lt;BR /&gt;&amp;gt; side, and we have to model the network&amp;amp;tcpip queue as an M/G/1&lt;BR /&gt;&amp;gt; and this will complicate the mathematical analytic modeling...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. that means the &lt;BR /&gt;counting increments must be independant.&lt;/P&gt;&lt;P&gt;We have the following relation between &lt;BR /&gt;the Poisson law and Exponential law: &lt;/P&gt;&lt;P&gt;the expected value E(X exponential) = 1 / E(X poisson)&lt;/P&gt;&lt;P&gt;and if the arrival is poissonian than the interarricals are exponential.. &lt;/P&gt;&lt;P&gt;Now what about a webserver ? &lt;/P&gt;&lt;P&gt;I have read the following paper:&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ"&gt;http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQit&lt;/A&gt; &lt;BR /&gt;It says that a simple model with M/M/1/k with FCFS discipline&lt;BR /&gt;can predict webserver performance quite well.. &lt;/P&gt;&lt;P&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this&lt;/P&gt;&lt;P&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue&lt;/P&gt;&lt;P&gt;and we have the following:&lt;/P&gt;&lt;P&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue&lt;/P&gt;&lt;P&gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network.&lt;/P&gt;&lt;P&gt;After that we apply the Little formula:&lt;/P&gt;&lt;P&gt;A: network arrival rate&lt;BR /&gt;T: average response time&lt;/P&gt;&lt;P&gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simmulate the jackson queuing network that model our webservers...&lt;/P&gt;&lt;P&gt;Now there is still an important question that i have:&lt;/P&gt;&lt;P&gt;Our analytical jackson network model can give us insight &lt;BR /&gt;on the webservers behavivior and validate our simulation&lt;BR /&gt;with fwptt.. but the difficulty that i find in practice &lt;BR /&gt;is that: suppose we have found the right parameters&lt;BR /&gt;that we want to choose, like for example the service rate of &lt;BR /&gt;the M/M/1 Server Queue , how , from this service rate, can &lt;BR /&gt;we buy the right computer ?&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 01:41:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829650#M1483</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T01:41:29Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829651#M1484</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;&amp;gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. &lt;BR /&gt;&lt;BR /&gt;must be independant.&lt;BR /&gt;&lt;BR /&gt;&amp;gt;that means the &lt;BR /&gt;&amp;gt;counting increments must be independant</description>
      <pubDate>Wed, 08 Sep 2010 01:58:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829651#M1484</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T01:58:33Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829652#M1485</link>
      <description>&lt;BR /&gt;Hello again,&lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;We say in OR that: &lt;/P&gt;&lt;P&gt;"Understanding the behavior of a system is what Queueing Theory &lt;BR /&gt;and Littles Law is all about. But, managing a Queue requires not &lt;BR /&gt;just understanding the behavior of a system, but also in-depth &lt;BR /&gt;knowledge of how to improve a system - improving both aspects &lt;BR /&gt;of Queueing will mean a better, more efficient and cost-effective &lt;BR /&gt;system and, more importantly, a much better customer experience."&lt;/P&gt;&lt;P&gt;I wrote before that:&lt;/P&gt;&amp;gt;It says that a simple model with M/M/1/k with FCFS discipline&lt;BR /&gt;&amp;gt;can predict webserver performance quite well.. &lt;P&gt;&amp;gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;&amp;gt;with 3 queues connected as a Jackson Network like this&lt;/P&gt;&lt;P&gt;&amp;gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue&lt;/P&gt;&lt;P&gt;&amp;gt;and we have the following:&lt;/P&gt;&lt;P&gt;&amp;gt;Ni: number of jobs in each queue &lt;BR /&gt;&amp;gt;Ui: utilization of each queue&lt;/P&gt;&lt;P&gt;&amp;gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;&amp;gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;&amp;gt;average number of jobs in the entire queuing network.&lt;/P&gt;&lt;P&gt;&amp;gt;After that we apply the Little formula:&lt;/P&gt;&lt;P&gt;&amp;gt;A: network arrival rate&lt;BR /&gt;&amp;gt;T: average response time&lt;/P&gt;&lt;P&gt;&amp;gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;&amp;gt;And after that from the mathematical analytical equation &lt;BR /&gt;&amp;gt;we can simulate the jackson queuing&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you have noticed , this mathematical model of this jackson network &lt;BR /&gt;does in fact take into account theM/M/1 Network queue node ,the &lt;BR /&gt;USL model can not do this... and with this performance data &lt;BR /&gt;from the mathematical analytical model simulationwecanfor example &lt;BR /&gt;validate the performance data of the fwptt stress webserver simulation..&lt;BR /&gt;&lt;BR /&gt;But you have to take into account worst cases and the peak trafficloads...&lt;/P&gt;&lt;P&gt;Let for examplewe have a a webserver hostinghtml pages &lt;BR /&gt;and it is receiving 1000000 HTTP operations &lt;BR /&gt;per day with an average file size of 10 KB.&lt;/P&gt;&lt;P&gt;What would be the network bandwidth required for this website&lt;BR /&gt;considering peak traffic if the peak traffic load from past observations&lt;BR /&gt;was four times greater than average loads?&lt;/P&gt;&lt;P&gt;Required bandwidth is solved by the following equation:&lt;/P&gt;&lt;P&gt;HTTP op/sec x average file size or&lt;/P&gt;&lt;P&gt;1000000 HTTP ops per day =1000000/24 = 41,667 op/hour = &lt;BR /&gt;41,667/3600= 11.6 HTTP ops/sec&lt;/P&gt;&lt;P&gt;The needed bandwidth is&lt;/P&gt;&lt;P&gt;11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps.&lt;/P&gt;&lt;P&gt;If we assume a protocol overhead of 20% then the actual throughput &lt;BR /&gt;required is 928 Kbps X 1.2 = 1,114 Kbps. &lt;BR /&gt;&lt;BR /&gt;However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes&lt;/P&gt;&lt;P&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;BR /&gt;&lt;BR /&gt;So you have to think also about the cost of this line...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 09:08:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829652#M1485</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T09:08:40Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829653#M1486</link>
      <description>&lt;P&gt;&lt;BR /&gt;I will add the following:&lt;/P&gt;&lt;P&gt;We have to be smarter than that in OR (operational research)...&lt;/P&gt;&lt;P&gt;As you have noticed i said that:&lt;/P&gt;&lt;P&gt;"As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.."&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i said that:&lt;/P&gt;&lt;P&gt;"Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;And of course on Capacity Planning for Enterprise Datacenters &lt;BR /&gt;and Websites , you can mirror many computer servers and load &lt;BR /&gt;balance between them with a software... to make the system much &lt;BR /&gt;FASTER, and this will be modeled as a jackson network like this:&lt;/P&gt;&lt;P&gt;An M/M/n Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But there is still an important thing , as i have showed before &lt;BR /&gt;on my calculations:&lt;/P&gt;&lt;P&gt;"However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;So you have to think also about the cost of this line..."&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I think that you have to take into account the knee utilisation &lt;BR /&gt;of your M/M/n Servers Queues, if for example the number of computer &lt;BR /&gt;servers is 8 and the Knee is 74% that means that in our previous &lt;BR /&gt;example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have to take into account the cost of the line ...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So be smarter !&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramadne.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 10:33:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829653#M1486</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T10:33:09Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829654#M1487</link>
      <description>&lt;P&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt; I think that you have to take into account the knee utilisation&lt;BR /&gt;&amp;gt; of your M/M/n Servers Queues, if for example the number of computer&lt;BR /&gt;&amp;gt; servers is 8 and the Knee is 74% that means that in our previous&lt;BR /&gt;&amp;gt; example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I mean 126% X 4.456 Mbps = 5.614 Mbps.&lt;/P&gt;&lt;P&gt;Cause as you know over this Knee utilisation of 74% &lt;BR /&gt;for 8 servers, the curve of the waiting time does grow quickly ..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 10:57:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829654#M1487</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T10:57:07Z</dc:date>
    </item>
    <item>
      <title>A question about capacity planning and scalability [2]...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829655#M1488</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned all my previous posts , please read again... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I didn't know where to ask this question, so i decided to ask here.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I just read yesterday the following page, look at the the USL &lt;BR /&gt;(Universal Law of Computational Scalability) of Dr. Gunther, &lt;BR /&gt;he wrote this: ( see &lt;A href="http://en.wikipedia.org/wiki/Neil_J._Gunther"&gt;http://en.wikipedia.org/wiki/Neil_J._Gunther&lt;/A&gt; ) &lt;/P&gt;&lt;P&gt;-------------------------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The relative capacity C(N) of a computational platform is given by: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;C(N) = N &lt;BR /&gt; ------------------- &lt;BR /&gt; 1 +  (N - 1) +  N (N - 1) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;where N represents either the number of physical processors &lt;BR /&gt;in the hardware configuration or the number of users driving the &lt;BR /&gt;software application. The parameters  and  represent respectively &lt;BR /&gt;the levels of contention (e.g., queueing for shared resources) and &lt;BR /&gt;coherency delay (i.e., latency for data to become consistent) in the &lt;BR /&gt;system. The  parameter also quantifies the retrograde throughput &lt;BR /&gt;seen &lt;BR /&gt;in many stress tests but not accounted for in either Amdahl's law or &lt;BR /&gt;event-based simulations. &lt;/P&gt;&lt;P&gt;---------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;His website: &lt;A href="http://www.perfdynamics.com/"&gt;http://www.perfdynamics.com/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you read carefully , you will see that Dr. Gunther is using this &lt;BR /&gt;model to predict scalability after he simulates a relatively small &lt;BR /&gt;number of vusers in LoadRunner ( because of licensing costs, it's &lt;BR /&gt;cost-effective) and after that he finds the coefficients of the &lt;BR /&gt;2nd-degree polynomial (quadratic equation) and then transform &lt;BR /&gt;those coefficients back to the USL parameters using the  = b - a &lt;BR /&gt;and  = a. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And then he is extrapolating with the USL model to higher loads &lt;BR /&gt;to predict scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;He is also applying the model to webservers with heterogeneous &lt;BR /&gt;workloads. like in the following page: &lt;/P&gt;&lt;P&gt;&lt;A href="http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now my question follows: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation &lt;BR /&gt;to predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? I think USL can not predict this. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;When we are modeling webservers , we have to include &lt;BR /&gt;the network&amp;amp;tcp/ip in our network queuig model &lt;BR /&gt;(that comprises the queue of the computer server side) , &lt;BR /&gt;and since the service in the computer server is comprised of &lt;BR /&gt;multiple services (when we are using htmls , databases etc.) &lt;BR /&gt;the network&amp;amp;tcp/ip queue will not be markovian in the service &lt;BR /&gt;side, and we have to model the network&amp;amp;tcpip queue as an M/G/1 &lt;BR /&gt;and this will complicate the mathematical analytic modeling... &lt;/P&gt;&lt;P&gt;So, i think the best way is to use a webserver stress tool &lt;BR /&gt;like &lt;A href="http://fwptt.sourceforge.net/"&gt;http://fwptt.sourceforge.net/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You can even test the webserver with an heterogeneous &lt;BR /&gt;workloads by starting multiple fwtpp processes, and &lt;BR /&gt;you should increase the number of threads to 5 and after &lt;BR /&gt;that to 10 threads, 15 ... and so on until the webserver &lt;BR /&gt;applications stops responding propperly(and this will inform &lt;BR /&gt;on the maximum number of users that you can have in the same time...) &lt;BR /&gt;and as you are stress testing you can even see (by testing/measuring &lt;BR /&gt;it) if the network bandwidth is not the bottleneck... and this can &lt;BR /&gt;not be done by the USL model. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. must be independant &lt;BR /&gt;that means the counting increments must be independant. &lt;/P&gt;&lt;P&gt;We have the following relation between the Poisson law &lt;BR /&gt;and Exponential law: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;the expected value E(X exponential) = 1 / E(X poisson) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and if the arrival is poissonian then the interarrivals are &lt;BR /&gt;exponential.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now what about a webserver ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have read the following paper: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ"&gt;http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it says that a simple model with M/M/1/k with FCFS discipline &lt;BR /&gt;can predict webserver performance quite well.. &lt;/P&gt;&lt;P&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A -&amp;gt; M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;/P&gt;&lt;P&gt;A: is the arrival rate&lt;/P&gt;&lt;P&gt;and we have the following: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue &lt;/P&gt;&lt;P&gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;After that we apply the Little formula: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A: network arrival rate &lt;BR /&gt;T: average response time &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simulate the jackson queuing network that model our &lt;BR /&gt;webservers... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now there is still an important question that i have: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Our analytical jackson network model can give us insight &lt;BR /&gt;on the webservers behavivior.. but the difficulty that i find in &lt;BR /&gt;practice is that: suppose we have found the right parametters &lt;BR /&gt;that we want to choose, like for example the service rate of &lt;BR /&gt;the M/M/1 Server Queue , how , from this service rate, can &lt;BR /&gt;we buy the right computer that satisfies the service rate &lt;BR /&gt;that we want ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We say in OR that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"Understanding the behavior of a system is what Queueing Theory &lt;BR /&gt;and Littles Law is all about. But, managing a Queue requires not &lt;BR /&gt;just understanding the behavior of a system, but also in-depth &lt;BR /&gt;knowledge of how to improve a system - improving both aspects &lt;BR /&gt;of Queueing will mean a better, more efficient and cost-effective &lt;BR /&gt;system and, more importantly, a much better customer experience." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote before that: &lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;"It says that a simple model with M/M/1/k with FCFS discipline &lt;BR /&gt;can predict webserver performance quite well.. &lt;BR /&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;A -&amp;gt; M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;BR /&gt;A: the arrival rate&lt;BR /&gt;and we have the following: &lt;BR /&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue &lt;BR /&gt;Ni = Ui / (1-Ui) &lt;BR /&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network. &lt;BR /&gt;After that we apply the Little formula: &lt;BR /&gt;A: network arrival rate &lt;BR /&gt;T: average response time &lt;BR /&gt;N = A*T =&amp;gt; T = N / A &lt;BR /&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simulate the jackson queuing"&lt;BR /&gt;&lt;BR /&gt;--&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.. &lt;/P&gt;&lt;P&gt;But you have to take into account worst cases and the &lt;BR /&gt;peak traffic loads... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Let for example we have a a webserver hosting html pages &lt;BR /&gt;and it is receiving 1000000 HTTP operations &lt;BR /&gt;per day with an average file size of 10 KB. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;What would be the network bandwidth required for this website &lt;BR /&gt;considering peak traffic if the peak traffic load from past &lt;BR /&gt;observations was four times greater than average loads? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Required bandwidth is solved by the following equation: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;HTTP op/sec x average file size or &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1000000 HTTP ops per day =1000000/24 = 41,667 op/hour = &lt;BR /&gt;41,667/3600= 11.6 HTTP ops/sec &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The needed bandwidth is &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If we assume a protocol overhead of 20% then the actual throughput &lt;BR /&gt;required is 928 Kbps X 1.2 = 1,114 Kbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So you have to think also about the cost of this line... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will add the following: &lt;/P&gt;&lt;P&gt;As you have noticed i said that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i said that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course on Capacity Planning for Enterprise Datacenters &lt;BR /&gt;and Websites , you can mirror many computer servers and load &lt;BR /&gt;balance between them with a software... to make the system much &lt;BR /&gt;FASTER, and this will be modeled as a jackson network like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A -&amp;gt; M/M/n Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;/P&gt;&lt;P&gt;A: the arrival rate to the system"&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But there is still an important thing , as i have showed before &lt;BR /&gt;on my calculations: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;So you have to think also about the cost of this line..." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I think that you have also to take into account the knee utilisation &lt;BR /&gt;of your M/M/n Servers Queues, if for example the number of computer &lt;BR /&gt;servers is 8 and the Knee is 74% that means that in our previous &lt;BR /&gt;example the bandwidth must equal to: &lt;/P&gt;&lt;P&gt;126% X 4.456 Mbps = 5.614 Mbps. &lt;/P&gt;&lt;P&gt;Cause as you know, over this Knee of 74% for 8 servers &lt;BR /&gt;the curve of the waiting time does grow quickly .. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have to take into account the cost of the line ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So be smarter ! &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 11:50:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829655#M1488</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T11:50:25Z</dc:date>
    </item>
    <item>
      <title>A question about capacity planning and scalability [2]...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829656#M1489</link>
      <description>&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt;Cause as you know, over this Knee of 74% for 8 servers&lt;BR /&gt;&lt;BR /&gt;i mean: above this Knee of 74%...&lt;BR /&gt;&lt;BR /&gt;&amp;gt;the curve of the waiting time does grow quickly&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.</description>
      <pubDate>Wed, 08 Sep 2010 13:14:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829656#M1489</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T13:14:12Z</dc:date>
    </item>
    <item>
      <title>About capacity planning and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829657#M1490</link>
      <description>&lt;A href="http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db"&gt;&lt;BR /&gt;&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt; &lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Finally, i have cleaned all my previous posts and added to them, &lt;BR /&gt;please read again: &lt;/P&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;/A&gt;&lt;A href="http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db"&gt;http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db&lt;/A&gt;#&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;</description>
      <pubDate>Thu, 09 Sep 2010 17:13:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829657#M1490</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-09T17:13:30Z</dc:date>
    </item>
    <item>
      <title>My article about availibility and efficiency ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829658#M1491</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;My article about availibility and efficiency is on my website now,&lt;BR /&gt;(I have corrected some typos etc.)&lt;/P&gt;&lt;P&gt;Welcome: &lt;A href="http://pages.videotron.com/aminer/efficiency_availibility.htm"&gt;http://pages.videotron.com/aminer/efficiency_availibility.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Sep 2010 14:48:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829658#M1491</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-15T14:48:13Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829659#M1492</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waiting for the locked section to be released , hence, if &lt;BR /&gt;we have n threads this imply that the time the threads will &lt;BR /&gt;be waiting for the locked section to be released is: &lt;/P&gt;&lt;P&gt;Average time spend inside the locked section * (1 + 2 +3 + 4 ... + n) &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of n * (n + 1) /2 = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;So, i think i can rewrite the scalability equation to something like this,&lt;/P&gt;&lt;P&gt;Amine''s equation is: &lt;/P&gt;&lt;P&gt;Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;S: serial part&lt;BR /&gt;P: parallel part &lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;/P&gt;&lt;P&gt;And my equation clearly show that under contention and when &lt;BR /&gt;there is many threads we can have a retrogade throuput as in &lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 15:48:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829659#M1492</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T15:48:33Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829660#M1493</link>
      <description>&lt;BR /&gt;I correct some typos. ..&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Hello, &lt;P&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waitingfor the locked section to be released , hence, if we &lt;BR /&gt;have n threads this imply that the time the threads will be &lt;BR /&gt;waiting for the locked section to be released is: &lt;/P&gt;&lt;P&gt;Time spend inside the locked section * (1 + 2 +3 + 4 ... + n) &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;So, i think i can rewrite the scalability equation to something like this,&lt;/P&gt;&lt;P&gt;Amine''s equation is: &lt;/P&gt;&lt;P&gt;Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;S: serial part&lt;BR /&gt;P: parallel part &lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;/P&gt;&lt;P&gt;And my equation clearly show that under contention and when &lt;BR /&gt;there is many threads we can have a retrogade throuput as in &lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 15:56:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829660#M1493</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T15:56:56Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829661#M1494</link>
      <description>&lt;P&gt;I correct:&lt;/P&gt;&lt;P&gt;I wrote:&lt;BR /&gt;&amp;gt; Time spend inside the locked section * (1 + 2 +3 + 4 ... + n)&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; (n * ( n + 1)) / 2&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; and&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So, this mathematical term diverge when n -&amp;gt; infinite&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So, i think i can rewrite the scalability equation to something like&lt;BR /&gt;&amp;gt; this,&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Amine''s equation is:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If T is the time spend in the locked section,&lt;BR /&gt;and let us have a scenario of four threads, so &lt;BR /&gt;under heavy contention and in the worst case,&lt;BR /&gt;if the threads are all waiting for the locked-section &lt;BR /&gt;(locks,false sharing etc.) the first thread will not &lt;BR /&gt;wait, second thread will wait T and third thread &lt;BR /&gt;will wait 2 * T and fourth thread will ait 3 * T... &lt;/P&gt;&lt;P&gt;Hence, in my equation we have to replace n by n -1&lt;BR /&gt;so the Amahal equation, this will give us:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, my scalability equation become:&lt;BR /&gt;&lt;BR /&gt;Speed = 1 / (S * ( (n-1) * (n)) / 2 ) + (P / N)) &lt;BR /&gt;&lt;BR /&gt;and the Limit (n-&amp;gt; inite) of ((n-1) * (n)) / 2 = infinite&lt;/P&gt;&lt;P&gt;this term diverge..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;S: serial part&lt;BR /&gt;P: parallel part&lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;BR /&gt;&lt;BR /&gt;And my equation clearly show that under contention and when&lt;BR /&gt;there is many threads we can have a retrogade throuput as in&lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ...&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.http://pages.videotron.com/aminer/&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 16:16:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829661#M1494</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T16:16:40Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829662#M1495</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned and corrected my previous post, &lt;BR /&gt;here it is againand tell me what do you think...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm" target="_blank"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt;permit greater concurrency. &lt;/P&gt;&lt;P&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waiting for the locked section to be released:&lt;/P&gt;&lt;P&gt;If T is the time spend in the locked section,&lt;BR /&gt;and let us have a scenario of four threads, so &lt;BR /&gt;under heavy contention and in the worst case,&lt;BR /&gt;if the threads are all waiting for the locked-section &lt;BR /&gt;(locks,false sharing etc.) the first thread will not &lt;BR /&gt;wait, second thread will wait T and third thread &lt;BR /&gt;will wait 2 * T and fourth thread will wait 3 * T... &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;We have to replace n by (n -1) since n threads will be waiting &lt;/P&gt;&lt;P&gt;T * (n -1) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So the my scalabilityequation will become:&lt;BR /&gt;&lt;BR /&gt;Speed = 1 / (S * ( (n-1) * (n)) / 2 ) + (P / N)) &lt;BR /&gt;&lt;BR /&gt;and the Limit (n-&amp;gt; inite) of ((n-1) * (n)) / 2 = infinite&lt;/P&gt;&lt;P&gt;this term diverge..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;S: serial part&lt;BR /&gt;P: parallel part&lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;BR /&gt;&lt;BR /&gt;And my equation clearly show that under contention and when&lt;BR /&gt;there is many threads we can have a retrogade throuput as in&lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ...&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/" target="_blank"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 16:26:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829662#M1495</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T16:26:01Z</dc:date>
    </item>
  </channel>
</rss>

