Hello,
I didn't know where to ask this question, so i decided to ask here..
I just read yesterday the following page, look at the the USL
(Universal Law of Computational Scalability) of Dr. Gunther,
he wrote this: ( see http://en.wikipedia.org/wiki/Neil_J._Gunther)

The relative capacity C(N) of a computational platform is given by:
C(N) = N

1 + (N  1) + N (N  1)
where N represents either the number of physical processors
in the hardware configuration or the number of users driving the
software application. The parameters and represent respectively
the levels of contention (e.g., queueing for shared resources) and
coherency delay (i.e., latency for data to become consistent) in the
system. The parameter also quantifies the retrograde throughput seen
in many stress tests but not accounted for in either Amdahl's law or
eventbased simulations.

His website: http://www.perfdynamics.com/
If you read carefully , you will see that Dr. Gunther is using this
model to predict scalability after he simulates a relatively small
number
of vusers in LoadRunner ( because of licensing costs, it's cost
effectiveness)
and after that he finds the coefficients of the 2nddegree polynomial
(quadratic equation) and then transform those coefficients back to the
USL parameters using the = b  a and = a.
And than he is extrapolating with the USL model to higher loads
to predict scalability.
He is also applying the model to webservers with heterogeneous
workloads. like in the following page:
http://perfdynamics.blogspot.com/2009/04/assessinguslscalabilitywi...
Now my question follows:
Suppose we have obtained a small number of measured loadpoints
with Loadrunner or others tools, and we calculated the USL equation to
predict scalability of a webserver , how the USL model can predict if
the scalability/performance is limited by the network bandwidth and
not the server ?
Can it give this information ?
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
Link Copied
An important answer that i wrote (from comp.programming.threads newsgroup)...
On Sep 6, 4:12 pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> As the subject line suggests (I hope), I'm missing something
>really basic about lockfree algorithms. In particular, why
>they can't use locks.
In lockfree algorithms we reduce the duration for wich
locks are held to the miniumum, and we try to keep the
serial part in the Amdahl equation to the *MINIMUM*, this
is good for scalability.
And as we know, there are three ways to reduce lock contention:
1 Reduce the duration for which locks are held
2 Reduce the frequency with which locks are requested
or
3 Replace exclusive locks with coordination mechanisms that
permit greater concurrency.
and since in lockfree algorithms we are reducing the
duration for wich lock are held to the minimum, this
will reduce the chance of lock contention...
When we have lock contention, a factor called mean
waiting time will be added to the Amdhal equation
like this: Speed = 1/(S+mean waiting time+(P/N))
it's the mean waiting time of the threads that are
waiting on the lock , and when we reduce the duration
for wich locks are held this will lower the service time:
hence the waiting time will be smaller and this
will higher the speed in the Amdhal equation.
Look at how my Lockfree ParallelQueue is scaling:
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
If you loOK at this page, i wrote this:
[1] If there is LESS contention THEN the algorithm
will scale better. Due to the fact that S (the serial part)
become smaller with less contention , and as N becomes bigger,
the result  the speed of the program/algorithm... 
of the Amdahl's equation 1/(S+(P/N)) become bigger.
[2] If there is false sharing THEN the algorithm
will not scale well. Due to the fact that S (the serial part)
of the Amdahl's equation 1/(S+(P/N)) will becomes bigger
with false sharing, so, this is not good for scalability.
As you have notice, i am including the mean waiting time
factor as if it were a serial part, cause in Amdhal equation
there is only two parts S(serial part) and P(parallel part),
but the correct equation is: Speed = 1/(S+mean waiting time+(P/N)).
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
On Sep 6, 4:12 pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> As the subject line suggests (I hope), I'm missing something really basic about
> lockfree algorithms. In particular, why they can't use locks.
>
> Wikipedia and others define lockfree algorithms as those where at least one
> thread in a process is guaranteed to make progress, regardless of what the other
> threads are doing. So suppose I bundle an array with a mutex to make a
> "synchronized array," where each operation on the array grabs the mutex, does
> some work on the array (but does not call into code outside the array, i.e.,
> doesn't invoke any code that could itself try to grab the mutex), then releases
> the mutex. This uses a lock, but inside the "synchronized array" critical
> section, even if all the other threads in the process are stalled, the thread in
> the critical section is making progress.
>
> As I said, I'm missing something fundamental. I'd be grateful if somebody would
> point out what it is.
>
> Thanks,
>
> Scott
>
> 
> * C++ and Beyond: Meyers, Sutter, & Alexandrescu, Oct. 2427 near Seattle
> (http://cppandbeyond.com/)
> * License my training materials for commercial (http://tinyurl.com/yfzvkp9) or
> personal use (http://tinyurl.com/yl5ka5p).

It's the lockfree algorithm that is different ,
you retry when the transaction( the lock..) fails,
like in TM.
Regards,
Amine Moulay Ramdane.
On Sep 6, 5:36 pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> On 9/6/2010 2:27 PM, aminer wrote:
>
> > I don't understand ? in lockfree algorithms under x86
>
> "Lockfree algorithm" is a technical term. Its definition is independent of the
> language you are programming in and the hardware you are running on.
>
> Note that the definition of "lockfree" makes no mention of locks at all, so
> it's not something as simple as "lockfree is defined to use no locks". This,
> for example, is Wikipedia's definition
> (http://en.wikipedia.org/wiki/Nonblocking_algorithm#Lockfreedom):
>
> An algorithm is lockfree if it satisfies that when the program threads are
> run sufficiently long at least one of the threads make progress (for some
> sensible definition of progress).
>
> How does this definition preclude the use of e.g. a mutex in the algorithm?
You are confusing/mixing mutex with lock, you were speaking
about lock, and as i said before lockfree algorithms does
in fact use locks , like in x86 lockfree algorithms are
using a lock instruction in assembler , in SMR also they
use locks etc. the Mutex also does use locks in its code internally...
But as i said before , it's the lockfree algorithm
that is different , you retry when the transaction( the lock..)
fails, like in TM, and we call that Optimistic locking.
Regards,
Amine Mouly Ramdane.
Hello again,
Look at the following page, Dr Gunther is applying
the USL model to webservers:
http://perfdynamics.blogspot.com/2009/04/assessinguslscalabilitywithmixed.html
When we are modeling webservers , we have to include
the network&tcp/ip in our newtwork queuig model
(that comprises the queue of of the computer server side) ,
and since the service in the computer server is comprised of
multiple services (when we are using htmls , databases etc.)
the network&tcp/ip queue will not be markovian in the service
side, and we have to model the network&tcpip queue as an M/G/1
and this will complicate the mathematical analytic modeling...
Now as i said before:
Suppose we have obtained a small number of measured loadpoints
with Loadrunner or others tools, and we calculated the USL equation
to predict scalability of a webserver , how the USL model can predict if
the scalability/performance is limited by the network bandwidth and
not the server ? I think USL can not predict this.
So, i think the best way is to use a webserver stress tool
like http://fwptt.sourceforge.net/
You can even test the webserver with an heterogeneous
workloads by starting multiple fwtpp processes, and
you should increase the number of threads to 5 and after
that to 10 threads, 15 ... and so on until the webserver
applications stops responding propperly(and this will inform
on the maximum number of users that you can have in the same time...)
and as you are stress testing you can even see (by testing/measuring
it) if the network bandwidth is not the bottleneck... and this can
not be done by the USL model.
Regards,
Amine Moulay Ramdane
I wrote:
> When we are modeling webservers , we have to include
> the network&tcp/ip in our network queuig model
> (that comprises the queue of the computer server side) ,
> and since the service in the computer server is comprised of
> multiple services (when we are using htmls , databases etc.)
> the network&tcp/ip queue will not be markovian in the service
> side, and we have to model the network&tcpip queue as an M/G/1
> and this will complicate the mathematical analytic modeling...
We already know that to satisfy a Poisson process we must
have that N(t1) N(t0), N(t2) N(t1) etc. that means the
counting increments must be independant.
We have the following relation between
the Poisson law and Exponential law:
the expected value E(X exponential) = 1 / E(X poisson)
and if the arrival is poissonian than the interarricals are exponential..
Now what about a webserver ?
I have read the following paper:
http://docs.google.com/viewer?a=v&q=cache:JFYCp_dSPP4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%...
It says that a simple model with M/M/1/k with FCFS discipline
can predict webserver performance quite well..
Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
An M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue
and we have the following:
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1Ui)
Adding all the Ni in each individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the mathematical analytical equation
we can simmulate the jackson queuing network that model our webservers...
Now there is still an important question that i have:
Our analytical jackson network model can give us insight
on the webservers behavivior and validate our simulation
with fwptt.. but the difficulty that i find in practice
is that: suppose we have found the right parameters
that we want to choose, like for example the service rate of
the M/M/1 Server Queue , how , from this service rate, can
we buy the right computer ?
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
We say in OR that:
"Understanding the behavior of a system is what Queueing Theory
and Littles Law is all about. But, managing a Queue requires not
just understanding the behavior of a system, but also indepth
knowledge of how to improve a system  improving both aspects
of Queueing will mean a better, more efficient and costeffective
system and, more importantly, a much better customer experience."
I wrote before that:
>It says that a simple model with M/M/1/k with FCFS discipline>Hence, i think we can model our webserver over internet
>with 3 queues connected as a Jackson Network like this
>An M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue
>and we have the following:
>Ni: number of jobs in each queue
>Ui: utilization of each queue
>Ni = Ui / (1Ui)
>Adding all the Ni in each individual queue will give the
>average number of jobs in the entire queuing network.
>After that we apply the Little formula:
>A: network arrival rate
>T: average response time
>N = A*T => T = N / A
>And after that from the mathematical analytical equation
>we can simulate the jackson queuing
As you have noticed , this mathematical model of this jackson network
does in fact take into account theM/M/1 Network queue node ,the
USL model can not do this... and with this performance data
from the mathematical analytical model simulationwecanfor example
validate the performance data of the fwptt stress webserver simulation..
But you have to take into account worst cases and the peak trafficloads...
Let for examplewe have a a webserver hostinghtml pages
and it is receiving 1000000 HTTP operations
per day with an average file size of 10 KB.
What would be the network bandwidth required for this website
considering peak traffic if the peak traffic load from past observations
was four times greater than average loads?
Required bandwidth is solved by the following equation:
HTTP op/sec x average file size or
1000000 HTTP ops per day =1000000/24 = 41,667 op/hour =
41,667/3600= 11.6 HTTP ops/sec
The needed bandwidth is
11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps.
If we assume a protocol overhead of 20% then the actual throughput
required is 928 Kbps X 1.2 = 1,114 Kbps.
However if peak loads as we say before is as much as
4 times greater, the bandwidth required to handle spikes
it would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the cost of this line...
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
I will add the following:
We have to be smarter than that in OR (operational research)...
As you have noticed i said that:
"As you have noticed , this mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver simulation.."
and i said that:
"Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
An M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue"
And of course on Capacity Planning for Enterprise Datacenters
and Websites , you can mirror many computer servers and load
balance between them with a software... to make the system much
FASTER, and this will be modeled as a jackson network like this:
An M/M/n Server Queue > M/M/1 Network queue > M/M/1 Client queue"
But there is still an important thing , as i have showed before
on my calculations:
"However if peak loads as we say before is as much as
4 times greater, the bandwidth required to handle spikes
it would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the cost of this line..."
I think that you have to take into account the knee utilisation
of your M/M/n Servers Queues, if for example the number of computer
servers is 8 and the Knee is 74% that means that in our previous
example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps
And we have to take into account the cost of the line ...
So be smarter !
Regards,
Amine Moulay Ramadne.
http://pages.videotron.com/aminer/
I wrote:
> I think that you have to take into account the knee utilisation
> of your M/M/n Servers Queues, if for example the number of computer
> servers is 8 and the Knee is 74% that means that in our previous
> example the bandwidth must equal to: 174% X 4.456 Mbps = 7.753 Mbps
I mean 126% X 4.456 Mbps = 5.614 Mbps.
Cause as you know over this Knee utilisation of 74%
for 8 servers, the curve of the waiting time does grow quickly ..
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
Hello,
I have cleaned all my previous posts , please read again...
I didn't know where to ask this question, so i decided to ask here..
I just read yesterday the following page, look at the the USL
(Universal Law of Computational Scalability) of Dr. Gunther,
he wrote this: ( see http://en.wikipedia.org/wiki/Neil_J._Gunther )

The relative capacity C(N) of a computational platform is given by:
C(N) = N

1 + (N  1) + N (N  1)
where N represents either the number of physical processors
in the hardware configuration or the number of users driving the
software application. The parameters and represent respectively
the levels of contention (e.g., queueing for shared resources) and
coherency delay (i.e., latency for data to become consistent) in the
system. The parameter also quantifies the retrograde throughput
seen
in many stress tests but not accounted for in either Amdahl's law or
eventbased simulations.

His website: http://www.perfdynamics.com/
If you read carefully , you will see that Dr. Gunther is using this
model to predict scalability after he simulates a relatively small
number of vusers in LoadRunner ( because of licensing costs, it's
costeffective) and after that he finds the coefficients of the
2nddegree polynomial (quadratic equation) and then transform
those coefficients back to the USL parameters using the = b  a
and = a.
And then he is extrapolating with the USL model to higher loads
to predict scalability.
He is also applying the model to webservers with heterogeneous
workloads. like in the following page:
http://perfdynamics.blogspot.com/2009/04/assessinguslscalabilitywithmixed.html
Now my question follows:
Suppose we have obtained a small number of measured loadpoints
with Loadrunner or others tools, and we calculated the USL equation
to predict scalability of a webserver , how the USL model can predict if
the scalability/performance is limited by the network bandwidth and
not the server ? I think USL can not predict this.
When we are modeling webservers , we have to include
the network&tcp/ip in our network queuig model
(that comprises the queue of the computer server side) ,
and since the service in the computer server is comprised of
multiple services (when we are using htmls , databases etc.)
the network&tcp/ip queue will not be markovian in the service
side, and we have to model the network&tcpip queue as an M/G/1
and this will complicate the mathematical analytic modeling...
So, i think the best way is to use a webserver stress tool
like http://fwptt.sourceforge.net/
You can even test the webserver with an heterogeneous
workloads by starting multiple fwtpp processes, and
you should increase the number of threads to 5 and after
that to 10 threads, 15 ... and so on until the webserver
applications stops responding propperly(and this will inform
on the maximum number of users that you can have in the same time...)
and as you are stress testing you can even see (by testing/measuring
it) if the network bandwidth is not the bottleneck... and this can
not be done by the USL model.
We already know that to satisfy a Poisson process we must
have that N(t1) N(t0), N(t2) N(t1) etc. must be independant
that means the counting increments must be independant.
We have the following relation between the Poisson law
and Exponential law:
the expected value E(X exponential) = 1 / E(X poisson)
and if the arrival is poissonian then the interarrivals are
exponential..
Now what about a webserver ?
I have read the following paper:
And it says that a simple model with M/M/1/k with FCFS discipline
can predict webserver performance quite well..
Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
A > M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue > A
A: is the arrival rate
and we have the following:
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1Ui)
Adding all the Ni in each individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the mathematical analytical equation
we can simulate the jackson queuing network that model our
webservers...
Now there is still an important question that i have:
Our analytical jackson network model can give us insight
on the webservers behavivior.. but the difficulty that i find in
practice is that: suppose we have found the right parametters
that we want to choose, like for example the service rate of
the M/M/1 Server Queue , how , from this service rate, can
we buy the right computer that satisfies the service rate
that we want ?
We say in OR that:
"Understanding the behavior of a system is what Queueing Theory
and Littles Law is all about. But, managing a Queue requires not
just understanding the behavior of a system, but also indepth
knowledge of how to improve a system  improving both aspects
of Queueing will mean a better, more efficient and costeffective
system and, more importantly, a much better customer experience."
I wrote before that:

"It says that a simple model with M/M/1/k with FCFS discipline
can predict webserver performance quite well..
Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
A > M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue > A
A: the arrival rate
and we have the following:
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1Ui)
Adding all the Ni in each individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the mathematical analytical equation
we can simulate the jackson queuing"

As you have noticed , this mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver simulation..
But you have to take into account worst cases and the
peak traffic loads...
Let for example we have a a webserver hosting html pages
and it is receiving 1000000 HTTP operations
per day with an average file size of 10 KB.
What would be the network bandwidth required for this website
considering peak traffic if the peak traffic load from past
observations was four times greater than average loads?
Required bandwidth is solved by the following equation:
HTTP op/sec x average file size or
1000000 HTTP ops per day =1000000/24 = 41,667 op/hour =
41,667/3600= 11.6 HTTP ops/sec
The needed bandwidth is
11.6 HTTP ops/sec X 10 KB/HTTP op = 116 KB/sec = 928 Kbps.
If we assume a protocol overhead of 20% then the actual throughput
required is 928 Kbps X 1.2 = 1,114 Kbps.
However if peak loads as we say before is as much as
4 times greater, the bandwidth required to handle spikes
would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the cost of this line...
I will add the following:
As you have noticed i said that:
"As you have noticed , this mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver simulation.."
and i said that:
"Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
An M/M/1 Server Queue > M/M/1 Network queue > M/M/1 Client queue"
And of course on Capacity Planning for Enterprise Datacenters
and Websites , you can mirror many computer servers and load
balance between them with a software... to make the system much
FASTER, and this will be modeled as a jackson network like this:
A > M/M/n Server Queue > M/M/1 Network queue > M/M/1 Client queue > A
A: the arrival rate to the system"
But there is still an important thing , as i have showed before
on my calculations:
"However if peak loads as we say before is as much as
4 times greater, the bandwidth required to handle spikes
it would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the cost of this line..."
I think that you have also to take into account the knee utilisation
of your M/M/n Servers Queues, if for example the number of computer
servers is 8 and the Knee is 74% that means that in our previous
example the bandwidth must equal to:
126% X 4.456 Mbps = 5.614 Mbps.
Cause as you know, over this Knee of 74% for 8 servers
the curve of the waiting time does grow quickly ..
And we have to take into account the cost of the line ...
So be smarter !
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
Hello,
My article about availibility and efficiency is on my website now,
(I have corrected some typos etc.)
Welcome: http://pages.videotron.com/aminer/efficiency_availibility.htm
Regards,
Amine Moulay Ramdane
http://pages.videotron.com/aminer/
Hello,
If you look at benchmarks of my lockfree ParallelQueue :
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
You will notice that under contention lockfree Ringbuffer have a
retrograde throuput and lockfree flqueue also have a retrograde
throuput in the pop() side ...but my lockfree ParallelQueue
(that is using also lock striping) has a good scalability
But if we draw the graph of Amadahl's law it doesn't show any
retrograde throuput...
So , i was thinking more about this, so please follow
with me..
In lockfree algorithms we reduce the duration for wich
locks are held to the miniumum, and we try to keep the
serial part in the Amdahl equation to the *MINIMUM*, this
is good for scalability.
And as we know, there are three ways to reduce lock contention:
1 Reduce the duration for which locks are held
2 Reduce the frequency with which locks are requested
or
3 Replace exclusive locks with coordination mechanisms that
permit greater concurrency.
and since in lockfree algorithms we are reducing the
duration for wich lock are held to the minimum, this
will reduce the chance of lock contention...
So, under contention and in the *worst* case, threads will
be waiting for the locked section to be released , hence, if
we have n threads this imply that the time the threads will
be waiting for the locked section to be released is:
Average time spend inside the locked section * (1 + 2 +3 + 4 ... + n)
And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:
(n * ( n + 1)) / 2
and
Limit (n > infinite) of n * (n + 1) /2 = infinite
So, this mathematical term diverge when n > infinite
So, i think i can rewrite the scalability equation to something like this,
Amine''s equation is:
Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)
S: serial part
P: parallel part
n: number of threads
N: number of cores
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput as in
lockfree Ringbuffer and lockfree flqueue ...
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
If you look at benchmarks of my lockfree ParallelQueue :
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
You will notice that under contention lockfree Ringbuffer have a
retrograde throuput and lockfree flqueue also have a retrograde
throuput in the pop() side ...but my lockfree ParallelQueue
(that is using also lock striping) has a good scalability
But if we draw the graph of Amadahl's law it doesn't show any
retrograde throuput...
So , i was thinking more about this, so please follow
with me..
In lockfree algorithms we reduce the duration for wich
locks are held to the miniumum, and we try to keep the
serial part in the Amdahl equation to the *MINIMUM*, this
is good for scalability.
And as we know, there are three ways to reduce lock contention:
1 Reduce the duration for which locks are held
2 Reduce the frequency with which locks are requested
or
3 Replace exclusive locks with coordination mechanisms that
permit greater concurrency.
and since in lockfree algorithms we are reducing the
duration for wich lock are held to the minimum, this
will reduce the chance of lock contention...
So, under contention and in the *worst* case, threads will
be waitingfor the locked section to be released , hence, if we
have n threads this imply that the time the threads will be
waiting for the locked section to be released is:
Time spend inside the locked section * (1 + 2 +3 + 4 ... + n)
And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:
(n * ( n + 1)) / 2
and
Limit (n > infinite) of (n * (n + 1) /2) = infinite
So, this mathematical term diverge when n > infinite
So, i think i can rewrite the scalability equation to something like this,
Amine''s equation is:
Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)
S: serial part
P: parallel part
n: number of threads
N: number of cores
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput as in
lockfree Ringbuffer and lockfree flqueue ...
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
I correct:
I wrote:
> Time spend inside the locked section * (1 + 2 +3 + 4 ... + n)
>
> And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:
>
> (n * ( n + 1)) / 2
>
> and
>
> Limit (n > infinite) of (n * (n + 1) /2) = infinite
>
> So, this mathematical term diverge when n > infinite
>
> So, i think i can rewrite the scalability equation to something like
> this,
>
> Amine''s equation is:
>
> Speed = 1 / (S * ( n * ( n + 1)) / 2 ) + (P / N)) (2)
If T is the time spend in the locked section,
and let us have a scenario of four threads, so
under heavy contention and in the worst case,
if the threads are all waiting for the lockedsection
(locks,false sharing etc.) the first thread will not
wait, second thread will wait T and third thread
will wait 2 * T and fourth thread will ait 3 * T...
Hence, in my equation we have to replace n by n 1
so the Amahal equation, this will give us:
So, my scalability equation become:
Speed = 1 / (S * ( (n1) * (n)) / 2 ) + (P / N))
and the Limit (n> inite) of ((n1) * (n)) / 2 = infinite
this term diverge..
S: serial part
P: parallel part
n: number of threads
N: number of cores
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput as in
lockfree Ringbuffer and lockfree flqueue ...
Regards,
Amine Moulay Ramdane.http://pages.videotron.com/aminer/
Hello,
I have cleaned and corrected my previous post,
here it is againand tell me what do you think...
If you look at benchmarks of my lockfree ParallelQueue :
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
You will notice that under contention lockfree Ringbuffer have a
retrograde throuput and lockfree flqueue also have a retrograde
throuput in the pop() side ...but my lockfree ParallelQueue
(that is using also lock striping) has a good scalability
But if we draw the graph of Amadahl's law it doesn't show any
retrograde throuput...
So , i was thinking more about this, so please follow
with me..
In lockfree algorithms we reduce the duration for wich
locks are held to the miniumum, and we try to keep the
serial part in the Amdahl equation to the *MINIMUM*, this
is good for scalability.
And as we know, there are three ways to reduce lock contention:
1 Reduce the duration for which locks are held
2 Reduce the frequency with which locks are requested
or
3 Replace exclusive locks with coordination mechanisms that
permit greater concurrency.
and since in lockfree algorithms we are reducing the
duration for wich lock are held to the minimum, this
will reduce the chance of lock contention...
So, under contention and in the *worst* case, threads will
be waiting for the locked section to be released:
If T is the time spend in the locked section,
and let us have a scenario of four threads, so
under heavy contention and in the worst case,
if the threads are all waiting for the lockedsection
(locks,false sharing etc.) the first thread will not
wait, second thread will wait T and third thread
will wait 2 * T and fourth thread will wait 3 * T...
And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:
(n * ( n + 1)) / 2
and
Limit (n > infinite) of (n * (n + 1) /2) = infinite
So, this mathematical term diverge when n > infinite
We have to replace n by (n 1) since n threads will be waiting
T * (n 1)
So the my scalabilityequation will become:
Speed = 1 / (S * ( (n1) * (n)) / 2 ) + (P / N))
and the Limit (n> inite) of ((n1) * (n)) / 2 = infinite
this term diverge..
S: serial part
P: parallel part
n: number of threads
N: number of cores
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput as in
lockfree Ringbuffer and lockfree flqueue ...
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
For more complete information about compiler optimizations, see our Optimization Notice.