<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic A question about capacity planning and scalability [2]...  in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829667#M1500</link>
    <description>Hello Amine,&lt;BR /&gt;&lt;BR /&gt;We really appreciate your contributions, but we ask that you try to combine your posts into a single forum thread as much as possible. It just makes it easier for users to navigate the front page of the forum. I've merged your two part thread on capacity planning and scalability, and we're looking at merging some of your other threads.&lt;BR /&gt;&lt;BR /&gt;But once again, thanks for your contributions!&lt;BR /&gt;&lt;BR /&gt;==&lt;BR /&gt;Aubrey W.&lt;BR /&gt;Intel Software Network Support&lt;BR /&gt;</description>
    <pubDate>Thu, 16 Sep 2010 20:55:17 GMT</pubDate>
    <dc:creator>Aubrey_W_</dc:creator>
    <dc:date>2010-09-16T20:55:17Z</dc:date>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829643#M1476</link>
      <description>&lt;DIV id="inbdy"&gt;&lt;A name="msg_a9b4f3a9bfd2b629"&gt;&lt;/A&gt;&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt;&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;I didn't know where to ask this question, so i decided to ask here.. &lt;/P&gt;&lt;P&gt;I just read yesterday the following page, look at the the USL &lt;BR /&gt;(Universal Law of Computational Scalability) of Dr. Gunther, &lt;BR /&gt;he wrote this: ( see &lt;A href="http://en.wikipedia.org/wiki/Neil_J._Gunther"&gt;http://en.wikipedia.org/wiki/Neil_J._Gunther&lt;/A&gt;)&lt;BR /&gt;&lt;BR /&gt;-------------------------- &lt;/P&gt;&lt;P&gt;The relative capacity C(N) of a computational platform is given by: &lt;/P&gt;&lt;P&gt;C(N) =       N &lt;BR /&gt;         ------------------- &lt;BR /&gt;     1 +  (N - 1) +  N (N - 1) &lt;/P&gt;&lt;P&gt;where N represents either the number of physical processors &lt;BR /&gt;in the hardware configuration or the number of users driving the &lt;BR /&gt;software application. The parameters  and  represent respectively &lt;BR /&gt;the levels of contention (e.g., queueing for shared resources) and &lt;BR /&gt;coherency delay (i.e., latency for data to become consistent) in the &lt;BR /&gt;system. The  parameter also quantifies the retrograde throughput seen &lt;BR /&gt;in many stress tests but not accounted for in either Amdahl's law or &lt;BR /&gt;event-based simulations. &lt;/P&gt;&lt;P&gt;--------------- &lt;/P&gt;&lt;P&gt;His website: &lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://www.perfdynamics.com/&amp;amp;usg=AFQjCNF5RDoikW8IWPWAa1tLCunOOpkp8Q"&gt;http://www.perfdynamics.com/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;If you read carefully , you will see that Dr. Gunther is using this &lt;BR /&gt;model to predict scalability after he simulates a relatively small &lt;BR /&gt;number &lt;BR /&gt;of vusers in LoadRunner ( because of licensing costs, it's cost- &lt;BR /&gt;effectiveness) &lt;BR /&gt;and after that he finds the coefficients of the 2nd-degree polynomial &lt;BR /&gt;(quadratic equation) and then transform those coefficients back to the &lt;BR /&gt;USL parameters using the  = b - a and  = a. &lt;/P&gt;&lt;P&gt;And than he is extrapolating with the USL model to higher loads &lt;BR /&gt;to predict scalability. &lt;/P&gt;&lt;P&gt;He is also applying the model to webservers with heterogeneous &lt;BR /&gt;workloads. like in the following page: &lt;/P&gt;&lt;P&gt;&lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&amp;amp;usg=AFQjCNHBl5e0W05i5t3jhXzpL31ip9mwSQ"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-wi...&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Now my question follows: &lt;/P&gt;&lt;P&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation to &lt;BR /&gt;predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? &lt;/P&gt;&lt;P&gt;Can it give this information ? &lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A target="_blank" rel="nofollow" href="http://www.google.com/url?sa=D&amp;amp;q=http://pages.videotron.com/aminer/&amp;amp;usg=AFQjCNEc4XUK97itZ-yH6hly_HwgwpbUjg"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 07 Sep 2010 19:37:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829643#M1476</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-07T19:37:41Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829644#M1477</link>
      <description>&lt;P&gt;&lt;BR /&gt;An important answer that i wrote (from comp.programming.threads newsgroup)...&lt;BR /&gt;&lt;BR /&gt;On Sep 6, 4:12 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; As the subject line suggests (I hope), I'm missing something &lt;BR /&gt;&amp;gt;really basic about lock-free algorithms. In particular, why &lt;BR /&gt;&amp;gt;they can't use locks.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability.&lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested&lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention...&lt;/P&gt;&lt;P&gt;When we have lock contention, a factor called mean &lt;BR /&gt;waiting time will be added to the Amdhal equation &lt;BR /&gt;like this: Speed = 1/(S+mean waiting time+(P/N)) &lt;BR /&gt;it's the mean waiting time of the threads that are&lt;BR /&gt;waiting on the lock , and when we reduce the duration &lt;BR /&gt;for wich locks are held this will lower the service time: &lt;BR /&gt;hence the waiting time will be smaller and this &lt;BR /&gt;will higher the speed in the Amdhal equation.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Look at how my Lock-free ParallelQueue is scaling:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you loOK at this page, i wrote this:&lt;/P&gt;&lt;P&gt;[1] If there is LESS contention THEN the algorithm &lt;BR /&gt;will scale better. Due to the fact that S (the serial part) &lt;BR /&gt;become smaller with less contention , and as N becomes bigger,&lt;BR /&gt;the result - the speed of the program/algorithm... - &lt;BR /&gt;of the Amdahl's equation 1/(S+(P/N)) become bigger. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[2] If there is false sharing THEN the algorithm &lt;BR /&gt;will not scale well. Due to the fact that S (the serial part) &lt;BR /&gt;of the Amdahl's equation 1/(S+(P/N)) will becomes bigger &lt;BR /&gt;with false sharing, so, this is not good for scalability.&lt;/P&gt;&lt;P&gt;As you have notice, i am including the mean waiting time &lt;BR /&gt;factor as if it were a serial part, cause in Amdhal equation &lt;BR /&gt;there is only two parts S(serial part) and P(parallel part),&lt;BR /&gt;but the correct equation is: Speed = 1/(S+mean waiting time+(P/N)).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On Sep 6, 4:12 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; As the subject line suggests (I hope), I'm missing something really basic about&lt;BR /&gt;&amp;gt; lock-free algorithms. In particular, why they can't use locks.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Wikipedia and others define lock-free algorithms as those where at least one&lt;BR /&gt;&amp;gt; thread in a process is guaranteed to make progress, regardless of what the other&lt;BR /&gt;&amp;gt; threads are doing. So suppose I bundle an array with a mutex to make a&lt;BR /&gt;&amp;gt; "synchronized array," where each operation on the array grabs the mutex, does&lt;BR /&gt;&amp;gt; some work on the array (but does not call into code outside the array, i.e.,&lt;BR /&gt;&amp;gt; doesn't invoke any code that could itself try to grab the mutex), then releases&lt;BR /&gt;&amp;gt; the mutex. This uses a lock, but inside the "synchronized array" critical&lt;BR /&gt;&amp;gt; section, even if all the other threads in the process are stalled, the thread in&lt;BR /&gt;&amp;gt; the critical section is making progress.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; As I said, I'm missing something fundamental. I'd be grateful if somebody would&lt;BR /&gt;&amp;gt; point out what it is.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Thanks,&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Scott&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; --&lt;BR /&gt;&amp;gt; * C++ and Beyond: Meyers, Sutter, &amp;amp; Alexandrescu, Oct. 24-27 near Seattle&lt;BR /&gt;&amp;gt; (&lt;A href="http://cppandbeyond.com/"&gt;http://cppandbeyond.com/&lt;/A&gt;)&lt;BR /&gt;&amp;gt; * License my training materials for commercial (&lt;A href="http://tinyurl.com/yfzvkp9"&gt;http://tinyurl.com/yfzvkp9&lt;/A&gt;) or&lt;BR /&gt;&amp;gt; personal use (&lt;A href="http://tinyurl.com/yl5ka5p"&gt;http://tinyurl.com/yl5ka5p&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Sep 2010 20:53:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829644#M1477</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T20:53:58Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829645#M1478</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Hello again,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I will add that The N. J. Gunther Universal Scalability Law states&lt;BR /&gt;&lt;BR /&gt;The relative capacity C(N) is a normalized throughput given by: &lt;A&gt;&lt;/A&gt;&lt;BR clear="all" /&gt;&lt;TABLE width="100%" border="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;TABLE align="center" cellpadding="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD align="center" nowrap="nowrap"&gt;C(N) = &lt;/TD&gt;&lt;TD align="center" nowrap="nowrap"&gt;N &lt;DIV class="hrcomp"&gt;&lt;HR noshade="noshade" size="1" /&gt;&lt;/DIV&gt;1 + (N  1) + N (N  1)&lt;BR /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;BR /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;And as you will notice in the following graphs in:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;there is anegative return on investment when  &amp;gt; 0 andthe &lt;BR /&gt;coherency paramater &amp;gt; 0, thiscauses the throughput &lt;BR /&gt;to go retrograde faster in the lock-free RingBuffer(in both the &lt;BR /&gt;push and the pop) and in the lock-free flqueue (in the pop side),&lt;BR /&gt;but in my lock-free ParallelQueue itkeep scaling very well...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:19:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829645#M1478</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:19:21Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829646#M1479</link>
      <description>&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt; &lt;P&gt;&lt;/P&gt;&lt;DIV class="qt" id="qhide_91242"&gt;Scott Meyers wrote: &lt;BR /&gt;&amp;gt; I agree, but my question is about the *definition* &lt;BR /&gt;&amp;gt; of being lock-free and how that definition precludes &lt;BR /&gt;&amp;gt; the use of locks. I already know why lock-free is &lt;BR /&gt;&amp;gt; good for scalability. What I don't know is how the &lt;BR /&gt;&amp;gt; definition of lock-free prohibits using locks. &lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;BR /&gt;I don't understand ? in lock-free algorithms under x86 &lt;BR /&gt;we use for example a lock instruction in assembler, &lt;BR /&gt;and it's in fact a lock that we are using. &lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;It's the lock-free algorithm that is different , &lt;BR /&gt;you retry when the transaction( the lock..) fails, &lt;BR /&gt;like in TM. &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:32:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829646#M1479</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:32:32Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829647#M1480</link>
      <description>&lt;P&gt;On Sep 6, 5:36 pm, Scott Meyers &amp;lt;&lt;A href="mailto:NeverR...@aristeia.com"&gt;NeverR...@aristeia.com&lt;/A&gt;&amp;gt; wrote:&lt;BR /&gt;&amp;gt; On 9/6/2010 2:27 PM, aminer wrote:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; &amp;gt; I don't understand ? in lock-free algorithms under x86&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; "Lock-free algorithm" is a technical term. Its definition is independent of the&lt;BR /&gt;&amp;gt; language you are programming in and the hardware you are running on.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Note that the definition of "lock-free" makes no mention of locks at all, so&lt;BR /&gt;&amp;gt; it's not something as simple as "lock-free is defined to use no locks". This,&lt;BR /&gt;&amp;gt; for example, is Wikipedia's definition&lt;BR /&gt;&amp;gt; (&lt;A href="http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom"&gt;http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom&lt;/A&gt;):&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; An algorithm is lock-free if it satisfies that when the program threads are&lt;BR /&gt;&amp;gt; run sufficiently long at least one of the threads make progress (for some&lt;BR /&gt;&amp;gt; sensible definition of progress).&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; How does this definition preclude the use of e.g. a mutex in the algorithm?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;You are confusing/mixing mutex with lock, you were speaking &lt;BR /&gt;about lock, and as i said before lock-free algorithms does &lt;BR /&gt;in fact use locks , like in x86 lock-free algorithms are &lt;BR /&gt;using a lock instruction in assembler , in SMR also they &lt;BR /&gt;use locks etc. the Mutex also does use locks in its code internally...&lt;/P&gt;&lt;P&gt;But as i said before , it's the lock-free algorithm &lt;BR /&gt;that is different , you retry when the transaction( the lock..) &lt;BR /&gt;fails, like in TM, and we call that Optimistic locking.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Mouly Ramdane.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Sep 2010 21:49:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829647#M1480</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-06T21:49:05Z</dc:date>
    </item>
    <item>
      <title>Why no locks in, um, lock-free algorithms? Options</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829648#M1481</link>
      <description>In"Lock Free" coding you are only permitted to hold an uninterruptable atomic lock. Example:&lt;BR /&gt;&lt;BR /&gt; LOCK; add dword ptr [eax], 1&lt;BR /&gt;&lt;BR /&gt;In the above sequence (when eax is a legitimate, aligned,address in R/W memory)the LOCK prefex is used to permit Read/Modify/Write instruction to execute in an atomic manner. More important to "Lock Free" - the above instruction cannot be interrupted. Therefore, "Lock Free" means: no lock can be pre-empted.&lt;BR /&gt;&lt;BR /&gt;While a "Lock" is a software lock (typically using LOCK prefix to obtain the lock) and once the lock is obtained, interruptable code sequence follows. The problem with { lock, code, unlock} comes when the system pre-empts the code while holding the lock - all threads waiting for the lock wait for the duration of the preemption (plus remaining code time by thread holding lock).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Tue, 07 Sep 2010 12:15:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829648#M1481</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-09-07T12:15:50Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829649#M1482</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello again,&lt;/P&gt;&lt;P&gt;Look at the following page, Dr Gunther is applying &lt;BR /&gt;the USL model to webservers:&lt;/P&gt;&lt;P&gt;&lt;A href="http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;When we are modeling webservers , we have to include &lt;BR /&gt;the network&amp;amp;tcp/ip in our newtwork queuig model &lt;BR /&gt;(that comprises the queue of of the computer server side) , &lt;BR /&gt;and since the service in the computer server is comprised of &lt;BR /&gt;multiple services (when we are using htmls , databases etc.)&lt;BR /&gt;the network&amp;amp;tcp/ip queue will not be markovian in the service &lt;BR /&gt;side, and we have to model the network&amp;amp;tcpip queue as an M/G/1 &lt;BR /&gt;and this will complicate the mathematical analytic modeling...&lt;/P&gt;&lt;P&gt;Now as i said before:&lt;/P&gt;&lt;P&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation &lt;BR /&gt;to predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? I think USL can not predict this.&lt;/P&gt;&lt;P&gt;So, i think the best way is to use a webserver stress tool &lt;BR /&gt;like &lt;A href="http://fwptt.sourceforge.net/"&gt;http://fwptt.sourceforge.net/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can even test the webserver with an heterogeneous &lt;BR /&gt;workloads by starting multiple fwtpp processes, and &lt;BR /&gt;you should increase the number of threads to 5 and after &lt;BR /&gt;that to 10 threads, 15 ... and so on until the webserver&lt;BR /&gt;applications stops responding propperly(and this will inform &lt;BR /&gt;on the maximum number of users that you can have in the same time...) &lt;BR /&gt;and as you are stress testing you can even see (by testing/measuring &lt;BR /&gt;it) if the network bandwidth is not the bottleneck... and this can &lt;BR /&gt;not be done by the USL model.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Sep 2010 21:03:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829649#M1482</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-07T21:03:29Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829650#M1483</link>
      <description>&lt;P&gt;I wrote:&lt;BR /&gt;&amp;gt; When we are modeling webservers , we have to include&lt;BR /&gt;&amp;gt; the network&amp;amp;tcp/ip in our network queuig model&lt;BR /&gt;&amp;gt; (that comprises the queue of the computer server side) ,&lt;BR /&gt;&amp;gt; and since the service in the computer server is comprised of&lt;BR /&gt;&amp;gt; multiple services (when we are using htmls , databases etc.)&lt;BR /&gt;&amp;gt; the network&amp;amp;tcp/ip queue will not be markovian in the service&lt;BR /&gt;&amp;gt; side, and we have to model the network&amp;amp;tcpip queue as an M/G/1&lt;BR /&gt;&amp;gt; and this will complicate the mathematical analytic modeling...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. that means the &lt;BR /&gt;counting increments must be independant.&lt;/P&gt;&lt;P&gt;We have the following relation between &lt;BR /&gt;the Poisson law and Exponential law: &lt;/P&gt;&lt;P&gt;the expected value E(X exponential) = 1 / E(X poisson)&lt;/P&gt;&lt;P&gt;and if the arrival is poissonian than the interarricals are exponential.. &lt;/P&gt;&lt;P&gt;Now what about a webserver ? &lt;/P&gt;&lt;P&gt;I have read the following paper:&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ"&gt;http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQit&lt;/A&gt; &lt;BR /&gt;It says that a simple model with M/M/1/k with FCFS discipline&lt;BR /&gt;can predict webserver performance quite well.. &lt;/P&gt;&lt;P&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this&lt;/P&gt;&lt;P&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue&lt;/P&gt;&lt;P&gt;and we have the following:&lt;/P&gt;&lt;P&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue&lt;/P&gt;&lt;P&gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network.&lt;/P&gt;&lt;P&gt;After that we apply the Little formula:&lt;/P&gt;&lt;P&gt;A: network arrival rate&lt;BR /&gt;T: average response time&lt;/P&gt;&lt;P&gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simmulate the jackson queuing network that model our webservers...&lt;/P&gt;&lt;P&gt;Now there is still an important question that i have:&lt;/P&gt;&lt;P&gt;Our analytical jackson network model can give us insight &lt;BR /&gt;on the webservers behavivior and validate our simulation&lt;BR /&gt;with fwptt.. but the difficulty that i find in practice &lt;BR /&gt;is that: suppose we have found the right parameters&lt;BR /&gt;that we want to choose, like for example the service rate of &lt;BR /&gt;the M/M/1 Server Queue , how , from this service rate, can &lt;BR /&gt;we buy the right computer ?&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 01:41:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829650#M1483</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T01:41:29Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829651#M1484</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;&amp;gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. &lt;BR /&gt;&lt;BR /&gt;must be independant.&lt;BR /&gt;&lt;BR /&gt;&amp;gt;that means the &lt;BR /&gt;&amp;gt;counting increments must be independant</description>
      <pubDate>Wed, 08 Sep 2010 01:58:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829651#M1484</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T01:58:33Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829652#M1485</link>
      <description>&lt;BR /&gt;Hello again,&lt;BR /&gt;&lt;P&gt;&lt;BR /&gt;We say in OR that: &lt;/P&gt;&lt;P&gt;"Understanding the behavior of a system is what Queueing Theory &lt;BR /&gt;and Littles Law is all about. But, managing a Queue requires not &lt;BR /&gt;just understanding the behavior of a system, but also in-depth &lt;BR /&gt;knowledge of how to improve a system - improving both aspects &lt;BR /&gt;of Queueing will mean a better, more efficient and cost-effective &lt;BR /&gt;system and, more importantly, a much better customer experience."&lt;/P&gt;&lt;P&gt;I wrote before that:&lt;/P&gt;&amp;gt;It says that a simple model with M/M/1/k with FCFS discipline&lt;BR /&gt;&amp;gt;can predict webserver performance quite well.. &lt;P&gt;&amp;gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;&amp;gt;with 3 queues connected as a Jackson Network like this&lt;/P&gt;&lt;P&gt;&amp;gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue&lt;/P&gt;&lt;P&gt;&amp;gt;and we have the following:&lt;/P&gt;&lt;P&gt;&amp;gt;Ni: number of jobs in each queue &lt;BR /&gt;&amp;gt;Ui: utilization of each queue&lt;/P&gt;&lt;P&gt;&amp;gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;&amp;gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;&amp;gt;average number of jobs in the entire queuing network.&lt;/P&gt;&lt;P&gt;&amp;gt;After that we apply the Little formula:&lt;/P&gt;&lt;P&gt;&amp;gt;A: network arrival rate&lt;BR /&gt;&amp;gt;T: average response time&lt;/P&gt;&lt;P&gt;&amp;gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;&amp;gt;And after that from the mathematical analytical equation &lt;BR /&gt;&amp;gt;we can simulate the jackson queuing&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you have noticed , this mathematical model of this jackson network &lt;BR /&gt;does in fact take into account theM/M/1 Network queue node ,the &lt;BR /&gt;USL model can not do this... and with this performance data &lt;BR /&gt;from the mathematical analytical model simulationwecanfor example &lt;BR /&gt;validate the performance data of the fwptt stress webserver simulation..&lt;BR /&gt;&lt;BR /&gt;But you have to take into account worst cases and the peak trafficloads...&lt;/P&gt;&lt;P&gt;Let for examplewe have a a webserver hostinghtml pages &lt;BR /&gt;and it is receiving 1000000 HTTP operations &lt;BR /&gt;per day with an average file size of 10 KB.&lt;/P&gt;&lt;P&gt;What would be the network bandwidth required for this website&lt;BR /&gt;considering peak traffic if the peak traffic load from past observations&lt;BR /&gt;was four times greater than average loads?&lt;/P&gt;&lt;P&gt;Required bandwidth is solved by the following equation:&lt;/P&gt;&lt;P&gt;HTTP op/sec x average file size or&lt;/P&gt;&lt;P&gt;1000000 HTTP ops per day =1000000/24 = 41,667 op/hour = &lt;BR /&gt;41,667/3600= 11.6 HTTP ops/sec&lt;/P&gt;&lt;P&gt;The needed bandwidth is&lt;/P&gt;&lt;P&gt;11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps.&lt;/P&gt;&lt;P&gt;If we assume a protocol overhead of 20% then the actual throughput &lt;BR /&gt;required is 928 Kbps X 1.2 = 1,114 Kbps. &lt;BR /&gt;&lt;BR /&gt;However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes&lt;/P&gt;&lt;P&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;BR /&gt;&lt;BR /&gt;So you have to think also about the cost of this line...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 09:08:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829652#M1485</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T09:08:40Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829653#M1486</link>
      <description>&lt;P&gt;&lt;BR /&gt;I will add the following:&lt;/P&gt;&lt;P&gt;We have to be smarter than that in OR (operational research)...&lt;/P&gt;&lt;P&gt;As you have noticed i said that:&lt;/P&gt;&lt;P&gt;"As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.."&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i said that:&lt;/P&gt;&lt;P&gt;"Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;And of course on Capacity Planning for Enterprise Datacenters &lt;BR /&gt;and Websites , you can mirror many computer servers and load &lt;BR /&gt;balance between them with a software... to make the system much &lt;BR /&gt;FASTER, and this will be modeled as a jackson network like this:&lt;/P&gt;&lt;P&gt;An M/M/n Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But there is still an important thing , as i have showed before &lt;BR /&gt;on my calculations:&lt;/P&gt;&lt;P&gt;"However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;So you have to think also about the cost of this line..."&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I think that you have to take into account the knee utilisation &lt;BR /&gt;of your M/M/n Servers Queues, if for example the number of computer &lt;BR /&gt;servers is 8 and the Knee is 74% that means that in our previous &lt;BR /&gt;example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have to take into account the cost of the line ...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So be smarter !&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramadne.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 10:33:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829653#M1486</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T10:33:09Z</dc:date>
    </item>
    <item>
      <title>A question about scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829654#M1487</link>
      <description>&lt;P&gt;&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt; I think that you have to take into account the knee utilisation&lt;BR /&gt;&amp;gt; of your M/M/n Servers Queues, if for example the number of computer&lt;BR /&gt;&amp;gt; servers is 8 and the Knee is 74% that means that in our previous&lt;BR /&gt;&amp;gt; example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I mean 126% X 4.456 Mbps = 5.614 Mbps.&lt;/P&gt;&lt;P&gt;Cause as you know over this Knee utilisation of 74% &lt;BR /&gt;for 8 servers, the curve of the waiting time does grow quickly ..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 10:57:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829654#M1487</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T10:57:07Z</dc:date>
    </item>
    <item>
      <title>A question about capacity planning and scalability [2]...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829655#M1488</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned all my previous posts , please read again... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I didn't know where to ask this question, so i decided to ask here.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I just read yesterday the following page, look at the the USL &lt;BR /&gt;(Universal Law of Computational Scalability) of Dr. Gunther, &lt;BR /&gt;he wrote this: ( see &lt;A href="http://en.wikipedia.org/wiki/Neil_J._Gunther"&gt;http://en.wikipedia.org/wiki/Neil_J._Gunther&lt;/A&gt; ) &lt;/P&gt;&lt;P&gt;-------------------------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The relative capacity C(N) of a computational platform is given by: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;C(N) = N &lt;BR /&gt; ------------------- &lt;BR /&gt; 1 +  (N - 1) +  N (N - 1) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;where N represents either the number of physical processors &lt;BR /&gt;in the hardware configuration or the number of users driving the &lt;BR /&gt;software application. The parameters  and  represent respectively &lt;BR /&gt;the levels of contention (e.g., queueing for shared resources) and &lt;BR /&gt;coherency delay (i.e., latency for data to become consistent) in the &lt;BR /&gt;system. The  parameter also quantifies the retrograde throughput &lt;BR /&gt;seen &lt;BR /&gt;in many stress tests but not accounted for in either Amdahl's law or &lt;BR /&gt;event-based simulations. &lt;/P&gt;&lt;P&gt;---------- &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;His website: &lt;A href="http://www.perfdynamics.com/"&gt;http://www.perfdynamics.com/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you read carefully , you will see that Dr. Gunther is using this &lt;BR /&gt;model to predict scalability after he simulates a relatively small &lt;BR /&gt;number of vusers in LoadRunner ( because of licensing costs, it's &lt;BR /&gt;cost-effective) and after that he finds the coefficients of the &lt;BR /&gt;2nd-degree polynomial (quadratic equation) and then transform &lt;BR /&gt;those coefficients back to the USL parameters using the  = b - a &lt;BR /&gt;and  = a. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And then he is extrapolating with the USL model to higher loads &lt;BR /&gt;to predict scalability. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;He is also applying the model to webservers with heterogeneous &lt;BR /&gt;workloads. like in the following page: &lt;/P&gt;&lt;P&gt;&lt;A href="http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html"&gt;http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-with-mixed.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now my question follows: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Suppose we have obtained a small number of measured load-points &lt;BR /&gt;with Loadrunner or others tools, and we calculated the USL equation &lt;BR /&gt;to predict scalability of a webserver , how the USL model can predict if &lt;BR /&gt;the scalability/performance is limited by the network bandwidth and &lt;BR /&gt;not the server ? I think USL can not predict this. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;When we are modeling webservers , we have to include &lt;BR /&gt;the network&amp;amp;tcp/ip in our network queuig model &lt;BR /&gt;(that comprises the queue of the computer server side) , &lt;BR /&gt;and since the service in the computer server is comprised of &lt;BR /&gt;multiple services (when we are using htmls , databases etc.) &lt;BR /&gt;the network&amp;amp;tcp/ip queue will not be markovian in the service &lt;BR /&gt;side, and we have to model the network&amp;amp;tcpip queue as an M/G/1 &lt;BR /&gt;and this will complicate the mathematical analytic modeling... &lt;/P&gt;&lt;P&gt;So, i think the best way is to use a webserver stress tool &lt;BR /&gt;like &lt;A href="http://fwptt.sourceforge.net/"&gt;http://fwptt.sourceforge.net/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You can even test the webserver with an heterogeneous &lt;BR /&gt;workloads by starting multiple fwtpp processes, and &lt;BR /&gt;you should increase the number of threads to 5 and after &lt;BR /&gt;that to 10 threads, 15 ... and so on until the webserver &lt;BR /&gt;applications stops responding propperly(and this will inform &lt;BR /&gt;on the maximum number of users that you can have in the same time...) &lt;BR /&gt;and as you are stress testing you can even see (by testing/measuring &lt;BR /&gt;it) if the network bandwidth is not the bottleneck... and this can &lt;BR /&gt;not be done by the USL model. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We already know that to satisfy a Poisson process we must &lt;BR /&gt;have that N(t1)- N(t0), N(t2)- N(t1) etc. must be independant &lt;BR /&gt;that means the counting increments must be independant. &lt;/P&gt;&lt;P&gt;We have the following relation between the Poisson law &lt;BR /&gt;and Exponential law: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;the expected value E(X exponential) = 1 / E(X poisson) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and if the arrival is poissonian then the interarrivals are &lt;BR /&gt;exponential.. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now what about a webserver ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have read the following paper: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ"&gt;http://docs.google.com/viewer?a=v&amp;amp;q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.4.5913%26rep%3Drep1%26type%3Dpdf+webserver+and+transactions+and+markov&amp;amp;hl=en&amp;amp;gl=ca&amp;amp;pid=bl&amp;amp;srcid=ADGEEShXQhVx6_yoawIXNk8Lor_1oCBQJhRlKbepMYpxUhrarWNJocn_xFDIhxgPl9IG54Wg1XfzZ4FZTL3r2WlW_DLMfRwjM66RBpYat2r5CHSmf69jG06F8J_s9T61acjOB7t3ZOuw&amp;amp;sig=AHIEtbTT_w21mLSSEsauMUOQ6iVpRttBYQ&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And it says that a simple model with M/M/1/k with FCFS discipline &lt;BR /&gt;can predict webserver performance quite well.. &lt;/P&gt;&lt;P&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A -&amp;gt; M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;/P&gt;&lt;P&gt;A: is the arrival rate&lt;/P&gt;&lt;P&gt;and we have the following: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue &lt;/P&gt;&lt;P&gt;Ni = Ui / (1-Ui) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;After that we apply the Little formula: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A: network arrival rate &lt;BR /&gt;T: average response time &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;N = A*T =&amp;gt; T = N / A &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simulate the jackson queuing network that model our &lt;BR /&gt;webservers... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now there is still an important question that i have: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Our analytical jackson network model can give us insight &lt;BR /&gt;on the webservers behavivior.. but the difficulty that i find in &lt;BR /&gt;practice is that: suppose we have found the right parametters &lt;BR /&gt;that we want to choose, like for example the service rate of &lt;BR /&gt;the M/M/1 Server Queue , how , from this service rate, can &lt;BR /&gt;we buy the right computer that satisfies the service rate &lt;BR /&gt;that we want ? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We say in OR that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"Understanding the behavior of a system is what Queueing Theory &lt;BR /&gt;and Littles Law is all about. But, managing a Queue requires not &lt;BR /&gt;just understanding the behavior of a system, but also in-depth &lt;BR /&gt;knowledge of how to improve a system - improving both aspects &lt;BR /&gt;of Queueing will mean a better, more efficient and cost-effective &lt;BR /&gt;system and, more importantly, a much better customer experience." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I wrote before that: &lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;"It says that a simple model with M/M/1/k with FCFS discipline &lt;BR /&gt;can predict webserver performance quite well.. &lt;BR /&gt;Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;A -&amp;gt; M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;BR /&gt;A: the arrival rate&lt;BR /&gt;and we have the following: &lt;BR /&gt;Ni: number of jobs in each queue &lt;BR /&gt;Ui: utilization of each queue &lt;BR /&gt;Ni = Ui / (1-Ui) &lt;BR /&gt;Adding all the Ni in each individual queue will give the &lt;BR /&gt;average number of jobs in the entire queuing network. &lt;BR /&gt;After that we apply the Little formula: &lt;BR /&gt;A: network arrival rate &lt;BR /&gt;T: average response time &lt;BR /&gt;N = A*T =&amp;gt; T = N / A &lt;BR /&gt;And after that from the mathematical analytical equation &lt;BR /&gt;we can simulate the jackson queuing"&lt;BR /&gt;&lt;BR /&gt;--&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.. &lt;/P&gt;&lt;P&gt;But you have to take into account worst cases and the &lt;BR /&gt;peak traffic loads... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Let for example we have a a webserver hosting html pages &lt;BR /&gt;and it is receiving 1000000 HTTP operations &lt;BR /&gt;per day with an average file size of 10 KB. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;What would be the network bandwidth required for this website &lt;BR /&gt;considering peak traffic if the peak traffic load from past &lt;BR /&gt;observations was four times greater than average loads? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Required bandwidth is solved by the following equation: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;HTTP op/sec x average file size or &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1000000 HTTP ops per day =1000000/24 = 41,667 op/hour = &lt;BR /&gt;41,667/3600= 11.6 HTTP ops/sec &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The needed bandwidth is &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If we assume a protocol overhead of 20% then the actual throughput &lt;BR /&gt;required is 928 Kbps X 1.2 = 1,114 Kbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So you have to think also about the cost of this line... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will add the following: &lt;/P&gt;&lt;P&gt;As you have noticed i said that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"As you have noticed , this mathematical model of &lt;BR /&gt;this jackson network does in fact take into account &lt;BR /&gt;the M/M/1 Network queue node , the USL model can not &lt;BR /&gt;do this... and with this performance data from the mathematical &lt;BR /&gt;analytical model simulation we can for example validate &lt;BR /&gt;the performance data of the fwptt stress webserver simulation.." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and i said that: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"Hence, i think we can model our webserver over internet &lt;BR /&gt;with 3 queues connected as a Jackson Network like this &lt;BR /&gt;An M/M/1 Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue" &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And of course on Capacity Planning for Enterprise Datacenters &lt;BR /&gt;and Websites , you can mirror many computer servers and load &lt;BR /&gt;balance between them with a software... to make the system much &lt;BR /&gt;FASTER, and this will be modeled as a jackson network like this: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A -&amp;gt; M/M/n Server Queue -&amp;gt; M/M/1 Network queue -&amp;gt; M/M/1 Client queue -&amp;gt; A&lt;/P&gt;&lt;P&gt;A: the arrival rate to the system"&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;But there is still an important thing , as i have showed before &lt;BR /&gt;on my calculations: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;"However if peak loads as we say before is as much as &lt;BR /&gt;4 times greater, the bandwidth required to handle spikes &lt;BR /&gt;it would be 4 X 1,114 Kbps = 4.456 Mbps. &lt;/P&gt;&lt;P&gt;So you have to think also about the cost of this line..." &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I think that you have also to take into account the knee utilisation &lt;BR /&gt;of your M/M/n Servers Queues, if for example the number of computer &lt;BR /&gt;servers is 8 and the Knee is 74% that means that in our previous &lt;BR /&gt;example the bandwidth must equal to: &lt;/P&gt;&lt;P&gt;126% X 4.456 Mbps = 5.614 Mbps. &lt;/P&gt;&lt;P&gt;Cause as you know, over this Knee of 74% for 8 servers &lt;BR /&gt;the curve of the waiting time does grow quickly .. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;And we have to take into account the cost of the line ... &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So be smarter ! &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2010 11:50:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829655#M1488</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T11:50:25Z</dc:date>
    </item>
    <item>
      <title>A question about capacity planning and scalability [2]...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829656#M1489</link>
      <description>&lt;BR /&gt;I wrote:&lt;BR /&gt;&amp;gt;Cause as you know, over this Knee of 74% for 8 servers&lt;BR /&gt;&lt;BR /&gt;i mean: above this Knee of 74%...&lt;BR /&gt;&lt;BR /&gt;&amp;gt;the curve of the waiting time does grow quickly&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.</description>
      <pubDate>Wed, 08 Sep 2010 13:14:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829656#M1489</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-08T13:14:12Z</dc:date>
    </item>
    <item>
      <title>About capacity planning and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829657#M1490</link>
      <description>&lt;A href="http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db"&gt;&lt;BR /&gt;&lt;SPAN class="fixed_width" style="font-family: Courier, Monospaced;"&gt; &lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Finally, i have cleaned all my previous posts and added to them, &lt;BR /&gt;please read again: &lt;/P&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;/A&gt;&lt;A href="http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db"&gt;http://groups.google.com/group/sci.op-research/browse_thread/thread/7158718ce30e65db&lt;/A&gt;#&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;</description>
      <pubDate>Thu, 09 Sep 2010 17:13:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829657#M1490</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-09T17:13:30Z</dc:date>
    </item>
    <item>
      <title>My article about availibility and efficiency ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829658#M1491</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;My article about availibility and efficiency is on my website now,&lt;BR /&gt;(I have corrected some typos etc.)&lt;/P&gt;&lt;P&gt;Welcome: &lt;A href="http://pages.videotron.com/aminer/efficiency_availibility.htm"&gt;http://pages.videotron.com/aminer/efficiency_availibility.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Sep 2010 14:48:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829658#M1491</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-15T14:48:13Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829659#M1492</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waiting for the locked section to be released , hence, if &lt;BR /&gt;we have n threads this imply that the time the threads will &lt;BR /&gt;be waiting for the locked section to be released is: &lt;/P&gt;&lt;P&gt;Average time spend inside the locked section * (1 + 2 +3 + 4 ... + n) &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of n * (n + 1) /2 = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;So, i think i can rewrite the scalability equation to something like this,&lt;/P&gt;&lt;P&gt;Amine''s equation is: &lt;/P&gt;&lt;P&gt;Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;S: serial part&lt;BR /&gt;P: parallel part &lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;/P&gt;&lt;P&gt;And my equation clearly show that under contention and when &lt;BR /&gt;there is many threads we can have a retrogade throuput as in &lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 15:48:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829659#M1492</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T15:48:33Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829660#M1493</link>
      <description>&lt;BR /&gt;I correct some typos. ..&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Hello, &lt;P&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt; permit greater concurrency. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waitingfor the locked section to be released , hence, if we &lt;BR /&gt;have n threads this imply that the time the threads will be &lt;BR /&gt;waiting for the locked section to be released is: &lt;/P&gt;&lt;P&gt;Time spend inside the locked section * (1 + 2 +3 + 4 ... + n) &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;So, i think i can rewrite the scalability equation to something like this,&lt;/P&gt;&lt;P&gt;Amine''s equation is: &lt;/P&gt;&lt;P&gt;Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;S: serial part&lt;BR /&gt;P: parallel part &lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;/P&gt;&lt;P&gt;And my equation clearly show that under contention and when &lt;BR /&gt;there is many threads we can have a retrogade throuput as in &lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards, &lt;BR /&gt;Amine Moulay Ramdane. &lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt; &lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 15:56:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829660#M1493</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T15:56:56Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829661#M1494</link>
      <description>&lt;P&gt;I correct:&lt;/P&gt;&lt;P&gt;I wrote:&lt;BR /&gt;&amp;gt; Time spend inside the locked section * (1 + 2 +3 + 4 ... + n)&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; (n * ( n + 1)) / 2&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; and&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So, this mathematical term diverge when n -&amp;gt; infinite&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So, i think i can rewrite the scalability equation to something like&lt;BR /&gt;&amp;gt; this,&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Amine''s equation is:&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If T is the time spend in the locked section,&lt;BR /&gt;and let us have a scenario of four threads, so &lt;BR /&gt;under heavy contention and in the worst case,&lt;BR /&gt;if the threads are all waiting for the locked-section &lt;BR /&gt;(locks,false sharing etc.) the first thread will not &lt;BR /&gt;wait, second thread will wait T and third thread &lt;BR /&gt;will wait 2 * T and fourth thread will ait 3 * T... &lt;/P&gt;&lt;P&gt;Hence, in my equation we have to replace n by n -1&lt;BR /&gt;so the Amahal equation, this will give us:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, my scalability equation become:&lt;BR /&gt;&lt;BR /&gt;Speed = 1 / (S * ( (n-1) * (n)) / 2 ) + (P / N)) &lt;BR /&gt;&lt;BR /&gt;and the Limit (n-&amp;gt; inite) of ((n-1) * (n)) / 2 = infinite&lt;/P&gt;&lt;P&gt;this term diverge..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;S: serial part&lt;BR /&gt;P: parallel part&lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;BR /&gt;&lt;BR /&gt;And my equation clearly show that under contention and when&lt;BR /&gt;there is many threads we can have a retrogade throuput as in&lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ...&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.http://pages.videotron.com/aminer/&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 16:16:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829661#M1494</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T16:16:40Z</dc:date>
    </item>
    <item>
      <title>About scalability...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829662#M1495</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have cleaned and corrected my previous post, &lt;BR /&gt;here it is againand tell me what do you think...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If you look at benchmarks of my lock-free ParallelQueue :&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm" target="_blank"&gt;http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will notice that under contention lock-free Ringbuffer have a &lt;BR /&gt;retrograde throuput and lockfree flqueue also have a retrograde &lt;BR /&gt;throuput in the pop() side ...but my lock-free ParallelQueue&lt;BR /&gt;(that is using also lock striping) has a good scalability&lt;/P&gt;&lt;P&gt;But if we draw the graph of Amadahl's law it doesn't show any &lt;BR /&gt;retrograde throuput...&lt;BR /&gt;&lt;BR /&gt;So , i was thinking more about this, so please follow &lt;BR /&gt;with me..&lt;/P&gt;&lt;P&gt;In lock-free algorithms we reduce the duration for wich &lt;BR /&gt;locks are held to the miniumum, and we try to keep the &lt;BR /&gt;serial part in the Amdahl equation to the *MINIMUM*, this &lt;BR /&gt;is good for scalability. &lt;/P&gt;&lt;P&gt;And as we know, there are three ways to reduce lock contention: &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;1- Reduce the duration for which locks are held &lt;BR /&gt;2- Reduce the frequency with which locks are requested &lt;BR /&gt;or &lt;BR /&gt;3- Replace exclusive locks with coordination mechanisms that &lt;BR /&gt;permit greater concurrency. &lt;/P&gt;&lt;P&gt;and since in lock-free algorithms we are reducing the &lt;BR /&gt;duration for wich lock are held to the minimum, this &lt;BR /&gt;will reduce the chance of lock contention... &lt;/P&gt;&lt;P&gt;So, under contention and in the *worst* case, threads will &lt;BR /&gt;be waiting for the locked section to be released:&lt;/P&gt;&lt;P&gt;If T is the time spend in the locked section,&lt;BR /&gt;and let us have a scenario of four threads, so &lt;BR /&gt;under heavy contention and in the worst case,&lt;BR /&gt;if the threads are all waiting for the locked-section &lt;BR /&gt;(locks,false sharing etc.) the first thread will not &lt;BR /&gt;wait, second thread will wait T and third thread &lt;BR /&gt;will wait 2 * T and fourth thread will wait 3 * T... &lt;/P&gt;&lt;P&gt;And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:&lt;/P&gt;&lt;P&gt;(n * ( n + 1)) / 2 &lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;Limit (n -&amp;gt; infinite) of (n * (n + 1) /2) = infinite &lt;/P&gt;&lt;P&gt;So, this mathematical term diverge when n -&amp;gt; infinite&lt;/P&gt;&lt;P&gt;We have to replace n by (n -1) since n threads will be waiting &lt;/P&gt;&lt;P&gt;T * (n -1) &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So the my scalabilityequation will become:&lt;BR /&gt;&lt;BR /&gt;Speed = 1 / (S * ( (n-1) * (n)) / 2 ) + (P / N)) &lt;BR /&gt;&lt;BR /&gt;and the Limit (n-&amp;gt; inite) of ((n-1) * (n)) / 2 = infinite&lt;/P&gt;&lt;P&gt;this term diverge..&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;S: serial part&lt;BR /&gt;P: parallel part&lt;BR /&gt;n: number of threads&lt;BR /&gt;N: number of cores&lt;BR /&gt;&lt;BR /&gt;And my equation clearly show that under contention and when&lt;BR /&gt;there is many threads we can have a retrogade throuput as in&lt;BR /&gt;lock-free Ringbuffer and lock-free flqueue ...&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/" target="_blank"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2010 16:26:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/A-question-about-scalability/m-p/829662#M1495</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-09-16T16:26:01Z</dc:date>
    </item>
  </channel>
</rss>

