- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, 
I didn't know where to ask this question, so i decided to ask here..
I just read yesterday the following page, look at the the USL 
(Universal Law of Computational Scalability) of Dr. Gunther, 
he wrote this: ( see http://en.wikipedia.org/wiki/Neil_J._Gunther)
-------------------------- 
The relative capacity C(N) of a computational platform is given by:
C(N) =       N 
         ------------------- 
     1 +  (N - 1) +  N (N - 1) 
where N represents either the number of physical processors 
in the hardware configuration or the number of users driving the 
software application. The parameters  and  represent respectively 
the levels of contention (e.g., queueing for shared resources) and 
coherency delay (i.e., latency for data to become consistent) in the 
system. The  parameter also quantifies the retrograde throughput seen 
in many stress tests but not accounted for in either Amdahl's law or 
event-based simulations. 
---------------
His website: http://www.perfdynamics.com/
If you read carefully , you will see that Dr. Gunther is using this 
model to predict scalability after he simulates a relatively small 
number 
of vusers in LoadRunner ( because of licensing costs, it's cost- 
effectiveness) 
and after that he finds the coefficients of the 2nd-degree polynomial 
(quadratic equation) and then transform those coefficients back to the 
USL parameters using the  = b - a and  = a. 
And than he is extrapolating with the USL model to higher loads 
to predict scalability. 
He is also applying the model to webservers with heterogeneous 
workloads. like in the following page: 
http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-wi...
Now my question follows:
Suppose we have obtained a small number of measured load-points 
with Loadrunner or others tools, and we calculated the USL equation to 
predict scalability of a webserver , how the USL model can predict if 
the scalability/performance is limited by the network bandwidth and 
not the server ? 
Can it give this information ?
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
We have to calculate the AVERAGE waiting time on 
the locked section under contention...
So i correct:
f T is the time spend in the locked section, and let us 
have a scenario of four threads, so under heavy contention 
and in the worst case, if the threads are all waiting for the
locked-section (locks,false sharing etc.) the first thread will not 
wait, second thread will wait T and third thread will wait 2 * T 
and fourth thread will wait 3 * T... 
And 1+ 2 +3 +4 ... n is an arithmetic serie and is equal to:
(n * ( n + 1)) / 2
and
Limit (n -> infinite) of (n * (n + 1) /2) = infinite
So, this mathematical term diverge when n -> infinite
We have to replace n by (n -1) in the arithmethic serie, 
since waiting thread number n on the locked section will 
be waiting T * (n -1) 
The average waiting time will be (T * ( (n-1) * (n)) / 2 )) / n
So my scalability equation is:
Speed = 1 / ((( S * ((n-1) * (n)) / 2 )/n) + (P / N))
S: serial part
P: parallel part
n: number of threads (or processes)
N: number of cores
and the Limit (n-> infinite) of ((n-1) * (n)) / 2 = infinite
this term diverge..
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput as with
lock-free Ringbuffer and lock-free flqueue ...
Please look at: 
http://pages.videotron.com/aminer/scalability.htm
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote:
So my scalability equation is now:
Speed = 1 / (S * ((((n-1) * (n)) / 2 )/n) + (P / N))
S: serial part
P: parallel part
n: number of threads (or processes)
N: number of cores
and the Limit (n-> infinite) of ((n-1) * (n)) / 2 = infinite
this term diverge.. and of course as n grows we will have 
a retrograde throuput...
As you have noticed i am using the average waiting time:
T * ((((n-1) * (n)) / 2) / n)
It's the average waiting time that the application 
is not running in parallel and where there is contention 
on the locked region.
And of course n (the number of threads or processes) 
MUST not equal 0.
Welcome: 
http://pages.videotron.com/aminer/scalability.htm 
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
Regards,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote:
>And of course n (the number of threads or processes)
>MUST not equal 0
I mean n must not equal 1.
n must be >= 2
Regards,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We really appreciate your contributions, but we ask that you try to combine your posts into a single forum thread as much as possible. It just makes it easier for users to navigate the front page of the forum. I've merged your two part thread on capacity planning and scalability, and we're looking at merging some of your other threads.
But once again, thanks for your contributions!
==
Aubrey W.
Intel Software Network Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And the one below. Sorry, that one is out of sequence because I didn't notice it at first.
Best regards,
==
Aubrey W.
Intel Software Network Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I wrote:
>So my scalability equation in a worst case contention is:
>Speed = 1 / (S * ((((n-1) * (n)) / 2 )/n) + (P / N)) 
>S: serial part
>P: parallel part
>n: number of threads (or processes)
>N: number of cores
Now, if we want a general equation that is more realistic, we have 
to add a contention factor in my equation , we will call it C in
percentage(the percentage of contention that we have),
and finally we will add a factor called  = C * n
so that my scalability equation will become:
Speed = 1 / (S * ((((-1) * ()) / 2 )/) + (P / N))
S: serial part
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage, 
C = average number of contended thread / n
and = C * n
And of course n (number of threads or processes) must be >= 2
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput such the one 
that we have in lock-free Ringbuffer and lock-free flqueue ...
Welcome:
http://pages.videotron.com/aminer/scalability.htm 
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote:
>Now, if we want a general equation that is more realistic, we have 
>to add a contention factor in my equation , we will call it C in 
>percentage(the percentage of contention that we have), 
>and finally we will add a factor called  = C * n 
>so that my scalability equation will become: 
>
> 
> Speed = 1 / (S * ((((-1) * ()) / 2 )/) + (P / N))
> 
> S: serial part
> P: parallel part
> n: number of threads (or processes)
> N: number of cores
> C: contention factor in percentage,
> 
> C = average number of contended thread / n
> 
> and  = C * n
> 
> And of course n (number of threads or processes) must be >= 2
Now you can apply my mathematical model..
I wrote: 
> C = average number of contended thread / n
Software compagnies must provide a mechanism/functions inside 
the lock prmitives mutex,critical sections etc. or the intel 
lock primitive..., that extract and calculate this average 
number of contended threads when we are are runnig our parallel 
application , and from this, you will just plug those numbers in 
my mathematical model and calculate the scalability.
Welcome: http://pages.videotron.com/aminer/scalability.htm
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
My new scalability mathematical model: 
Speed = 1 / (S * ((((-1) * ()) / 2 )/) + (P / N))
S: serial part 
P: parallel part 
n: number of threads (or processes) 
N: number of cores 
C: contention factor in percentage, 
C = average number of contended threads / n
and = C * n
You can of course calculate the S and P and 
the average number of contended threads (or processes). 
And you can also take average number of contended threads equal 
for example to 0.50 (50% or you can simulate it as you wich) 
and simulate and predict the scalability with my new mathematical 
model, by changing the following variables: N(number of cores) 
and n (number of threads or processes) .. and of course you can 
change the variable "average number of contended threads" as you 
wich in your mathematical simulation using my new mathematical model. 
Welcome: http://pages.videotron.com/aminer/scalability.htm
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
If you take a look at my Parallel Compression Library
http://pages.videotron.com/aminer/
i have benchmarked Parallel Gzip (that works on TMemoryStreams and
TFileStreams),and it's giving a good performance , almost the same
performance as Pigz.exe ...
Now , i want to optimize more my Parallel Compression Library...
If you take a look at my Parallel Compression Library in Object Pascal
take a look inside the zipfile: http://pages.videotron.com/aminer/
(look inside parallelcompression.pas)
It's usingamethod like this:
-----
procedure TCallbacks.GzipCompress(obj: Pointer);
var local_count:integer;
 local_stream:TMemoryStream;
 c_ok:boolean;
begin
try
local_stream:=TMemoryStream.create;
GZCompressStream(Tjob(obj).stream,local_stream,Tjob(obj).compressionlevel);
local_stream.position:=0;
repeat
if count_compress = Tjob(obj).index
then 
 begin
 
 Tjob(obj).fstream.CopyFrom(local_stream,local_stream.size);
 local_count:=count_compress+1;
 count_compress:=local_count;
 break;
 end;
until false ;
finally 
 Tjob(obj).stream.free; 
 local_stream.free;
 // Tjob(obj).free;
end;
if Tjob(obj).r > 0 
Then
 begin 
 if local_count = Tjob(obj).number +1 
 then 
 begin 
 Tjob(obj).event.setevent;
 Tjob(obj).free;
 end; 
 end
 else 
 begin
 if local_count = Tjob(obj).number 
 then 
 Begin
 Tjob(obj).event.setevent;
 Tjob(obj).free;
 end;
 end; 
end;
---
As you have noticed i am calling: 
Tjob(obj).fstream.CopyFrom(local_stream,local_stream.size);
So theworker threads of my Threadpool engineare also usingthe IO.
But we know that:
For threads that may wait for I/O to complete,we haveto increase 
the Threadpool pool size beyond the number of available cores, 
because not all threads of my Parallel Gzip compression will be working 
at all times (since they are using the IO and waiting for its completion). 
So my question follows:
Must iuse profiling tool,and estimate the ratio of waiting time (WT) 
to service time (ST) for a typical request.and calculate the ratio WT/ST, 
and after that estimate the number of Threadpool workers that i must 
have, by using the following equation:
Number of threads to keep thecores fully utilized = N*(1+WT/ST) 
Or must i use for example my TAsyncFileStream - asynchronous 
file input/output , to not wait at all,that you find inside my LockfreeUtils,
look inside the zip of Lockfreeutils: 
http://pages.videotron.com/aminer/
Regards,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello 
I will clarify something about my scalability mathematical equation..
Now, if we want a general equation that is more realistic, we have 
to add a contention factor in my equation , we will call it C in
percentage(the percentage of contention that we have),
and finally we will add a factor called  = C * n
So that my scalability equation will become:
Speed = 1 / Soc + (Sic * ((((-1) * ()) / 2 )/) + (P / N))
Sic: serial part inside the critical region(locked regions)
Soc: serial part outside the critical region
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage, 
C = average number of contended thread / n
and  = C * n 
And of course n (number of threads or processes) must be >= 2
But to be more realistic we have to neglect sometimes the Soc part,
as is the case when we run 'serially' the main thread - and there is 
no other serial parts other than the critical regions (locked-regions) -
and when after that we start our multiple threads .. i think the 
serial part of the main thread, in this case, can be neglected , 
so that my equation become: 
Speed = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
Sic: serial part inside the critical region(locked regions)
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage, 
I wrote before: 
> C = average number of contended threads / n
Software compagnies must provide a mechanism/functions inside 
the lock primitives mutex,critical sections etc. or the intel 
lock primitive..., that extract and calculate this average 
number of contended threads when we are are running our parallel 
application , and from this, you will just plug those numbers in 
my mathematical model and calculate the scalability.
And you can also take average number of contended threads equal 
for example to 0.50 (50% or you can simulate it as you wich) 
and simulate and predict the scalability with my new mathematical 
model, by changing the following variables: N(number of cores) 
and n (number of threads or processes) .. and of course you can 
change the variable "average number of contended threads" as you 
wich in your mathematical simulation using my new mathematical model. 
And as you have noticed , i said explicitly that you can simulate my 
mathematical model by varying the N (number of cores), n(number of
threads) and the "average number of contended threads", but implicitly, 
since the Soc, Sic and P are not varying, that means that you are keeping
the other configurations 'constant', example: the cores's frequency, and 
keeping the same hardisks configuration etc.
And my equation clearly show that under contention and when
there is many threads we can have a retrogade throuput such as the one 
that we have in lock-free Ringbuffer and lock-free flqueue ...
Please read here my article:
http://pages.videotron.com/aminer/scalability.htm
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote:
But to be more realistic, we have to neglect in many cases the Soc(serial part
outside the critical regions), as is the case when we run 'serially' the main thread
- and there is no other serial parts other than the critical regions (locked-regions) -
and when after that we start our multiple threads .. i think the serial part of
the main thread, in this case, can be neglected , so that my equation
become: Speed = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
Sic: serial part inside the critical region(locked regions)
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage,
http://pages.videotron.com/aminer/scalability.htm
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
I will clarify more...
Butto be more realistic, we have to neglect in many cases the Soc part,
as is the case when we run 'serially' the main thread - and there is
no other serial parts other than the critical regions (locked-regions) -
and when after that we start our multiple threads .. i think the serial part of
the main thread, in this case, can be neglected , so that my scalability equation
becomes: Speed = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
Sic: serial part inside the critical region(locked regions)
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage, 
To clarify this more:
My scalability equation can be viewed as function of multiple
variables (even the time) , like this
Speed(t,N,,Sic,Soc) = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N)) 
But if the Soc part ( of the main thread) is big (let say 50%) , 
and you run your pogram in a short time interval, let say the running
time interval is t = 1 minute, the Soc part will have a big effect on my 
scalability equation, and can not be neglected... But if you run the 
same program in a long period of time , as is the case when we run 
our servers applications, the Soc part (of the main thread) will have 
a small effect and can be neglected, so that my scalability equation 
become: 
Speed = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
http://pages.videotron.com/aminer/scalability.htm
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
My scalability equation can be viewed as function of multiple
variables (even the time) , like this
Speed(t,N,,Sic,Soc) = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
But if the Soc part ( of the main thread) is big (let say 50%) , 
and you run your pogram in a very short time interval, let say 
the running time interval is t = 10 seconds, the Soc part will have
a big effect on my scalability equation, and can not be neglected... 
But if you run the same program in a long period of time , as is 
the case when we run our servers applications, the Soc part (of 
the main thread) will have a small effect and can be neglected, 
so that my scalability equation become: 
Speed = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
The example above will look like this in function of the time t...
|--only the main thread is running (Soc part) --> 
--> the multiple threads start---> 
---> the mutiple threads are running and main thread waiting --> 
My article: 
http://pages.videotron.com/aminer/scalability.htm
That's all.
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, 
We have to be smart here, so please follow with me... 
My new scalability mathematical model says that: 
http://pages.videotron.com/aminer/scalability.htm 
[1] 
Speed(t,N,,Sic,Soc) = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N)) 
Sic: serial part inside the critical region(locked regions) 
Soc: serial part outside the critical region 
P: parallel part 
n: number of threads (or processes) 
N: number of cores 
C: contention factor in percentage, 
C = average number of contended threads / n 
and  = C * n 
And of course n (number of threads or processes) must be >= 2 
So if you take a look at (benchmarks) graphs: 
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm 
You will notice that my lock-free ParallelQueue is scaling better 
than lockfree RingBuffer and flqueue, cause my lock-free 
ParallelQueue 
is lock-free and it's keeping the locked region very small, just 
a CAS(longword(tail[0]),longword(lasttail),longword(newtemp)) 
on the Push() side, and a CAS(head[0],lasthead,lasthead+1) 
on the Pop() side , so it is reducing the Sic part in my 
scalability... 
And 
Also i am using lock-striping in my lock-free algorithm , that 
means that i am REDUCING the Sic part in my scalability equation 
even more, but since the Sic is very small, but it still exists, my 
scalability mathematical model predict that when n(number of threads) 
will grow,  = C * n will grow also, but since in equation [1] above 
the 
factor ((((-1) * ()) / 2 )/) = ((-1) * ()) / (2 * ) is growing 
much faster 
in the numerator ((-1) * ()) than the denominator (2 * ), that 
means 
that there will be a retrogade throuput when n (number of threads) 
becomes much bigger, and this rectrograde throuput is reached 
more quickly with lock-free flqueue and lock-free Ringbuffer than 
with my lock-free Parallelqueue. 
Welcome to my webpages: 
http://pages.videotron.com/aminer/scalability.htm 
and 
http://pages.videotron.com/aminer/ 
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
As you have noticed, i have spook in my last post about my new 
scalability mathematical model:
Read "A new scalability mathematical model better than Amdahl's law"
http://pages.videotron.com/aminer/scalability.htm
My scalability equation can be viewed as function of multiple
variables (even the time) , like this
[1]
Speed(t,N,,Sic,Soc) = 1 / (Sic * ((((-1) * ()) / 2 )/) + (P / N))
Sic: serial part inside the critical region(locked regions)
Soc: serial part outside the critical region
P: parallel part
n: number of threads (or processes)
N: number of cores
C: contention factor in percentage, 
C = average number of contended threads / n
and = C * n
And of course n (number of threads or processes) must be >= 2
But how in pratice can we reduce more the contention ?
There is logical contention: contention for access to data structures 
that need to be controlled to maintain the consistency...
and
physical contention: contention for physical memory, buses, harddisks ...
As you have noticed i wrote in:
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
"Because my lock-free ParallelQueue algorithm is using a hash 
based method - and lock striping - to redue more the contention 
and is using just a LockedInc() , so, i am REDUCING the duration 
for which locks are held AND REDUCING the frequency with which 
locks are requested, hence i am REDUCING A LOT the contention, 
so it's very efficient."
Hence, i am reducing the 'logical' contention in my lock-free ParallelQueue...
But there is still physical contention that can occur when we are 
using the memory , buses , hardisks...
Now how can we reduce this physical contention ?
To reduce the Sic part in my scalability equation [1] , you can
for example maximize the locality and use more caches (bigger size) 
and use a hardware with a big L3 cache, and use for example 
RAID 10 or ...with more cache memory and mirroring. 
If this is not enough, you an for example scale out your website, 
by using other computer servers(and use mirroring) and load balance 
between them.. 
I wrote in: 
http://pages.videotron.com/aminer/efficiency_availibility.htm
"So, as you have noticed you can simulate our enterprise 
webserver with this mathematical analytical equation and 
validate the data collected by fwptt stress webserrer tool 
and try to keep T( the average response time of your HTTP requests) 
bellow or equal to 5 seconds, and see also if your webserver 
is scaling, and we call this also Quality control using OR 
(operational research). "
So, if you want to keep the response time T 'low' in enterprise websites, 
you have to reduce the logical and physical contention as i explained 
before...
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer/ 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I have thought more about my mathematical model
http://pages.videotron.com/aminer/scalability.htm
I have not take into account a factor, and there is still a problem 
with my mathematical model.
I explain:
When . in the worst case, all threads are waiting for the lock , 
my equation works.
For physical contention , my mathematical model works, and 
it does predict the retrogade throuput.
To model for example my lock-free ParallelQueue or lock-free Ringbuffer
or lock-free flqueue under contention, it works, and it does predict the 
retrogade throuput.
But there still a problem with my mathematical model, 
if there is contention , and the scheduler schedules threads 
other than the ones that are waiting on the lock, my mathematical 
model doesn't work correctly .
But it was a good attempt from my part to try to find a general 
scalability equation that also predict the retrograde throuput.
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I was thinking more about those lock-free Queues...
If you take a look at lock-free flqueue:
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
Look at the code of the push()
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
---
function TLockfree_MPMC.pop(var obj:tNodeQueue):boolean;
var lastHead : longword;
begin
 repeat
(1)
 lastHead:=head[0];
 if tail[0]<>head[0]
 then
 begin
 obj:=getObject(integer(lastHead));
(2) 
 if CAS(head[0],lasthead,lasthead+1) 
 then 
 begin
 result:=true; 
 exit;
 end;
 end 
 else 
 begin
 result:=false;
 exit;
 end;
 until false;
end;
---
So, if head have changed the entire loop and the code 
beween (1) and (2) have to be reexecuted , so the P(parallel) part 
will become larger, and this can become more expensive than 
the scenario with only one thead, it's why there is a retrograde 
throuput in lock-free flqueue.
But if you take a look at the push() inside lock-free flqueue source code ,
it looks like this:
---
function TLockfree_MPMC.push(tm : tNodeQueue):boolean;//stdcall;
var lasttail,newtemp:integer;
begin
if getlength >= (fsize - margin) 
 then 
 begin
 result:=false;
 exit;
 end; 
result:=true;
//newTemp:=windows.interlockedincrement(temp);
newTemp:=LockedInc(temp);
lastTail:=newTemp-1;
setObject(lastTail,tm);
repeat
 if CAS(longword(tail[0]),longword(lasttail),longword(newtemp)) 
 then 
 begin
 //{$IFDEF MUTIPLE_PRODUCER}
 // sleep(0);
 //{$ENDIF}
 exit; 
 end;
sleep(0);
until false;
end;
---
So,as you have noticed it doesn't look like the pop(), if lasttail
have changed that doesn't mean that the thread will fail and retry 
and repeat the transaction.. and also, the loop between repeat until
is not so expensive than the loop in the pop() side .
This is why lock-free flqueue is experiencing a retrograde throuput 
in the push() side, but not in the pop() side (look at the graphs)
But with lock-free Ringbuffer there is a retrograde throuput in both 
the pop() and push().
With my lock-free ParallelQueue there is no retrograde throuput
in the graphs, cause i am using lock striping and avoiding all those 
problems, also, my ParallelQueue is avoiding contention and i am 
using it inside my Threadpool engine.
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is your question?
This is a forum and you seem to be wondering about an equation...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that you should write a nice article about your new equations and your ideas. However, writing 20 times the same equation in this forum doesn't make sense. This behavior is usually known as SPAM... :(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote:
>So,as you have noticed it doesn't look like the pop(), if lasttail
>have changed that doesn't mean that the thread will fail and retry
>and repeat the transaction..
I mean if tail have changed that doesn't mean that the thread
will reexecute the loop...
If we have thread A,B,C,D running in this order , and the first
threadA will execute the CAS, B,C, and Dafter that may
succeed andwill not reexecute the loop ... and if they fail and
reexecute the loop, the loop is less expensive thanin the push() side.
But in the push() if we have threads A,B,C,D that are executing
in this order beween (1) and (2) and the first threadA enters the CAS
(1)
lastHead:=head[0];
if tail[0]<>head[0]
then
begin
obj:=getObject(integer(lastHead));
(2)
if CAS(head[0],lasthead,lasthead+1
All the loop have to berexecuted for B,C Dand this loop is
more expensivethat therepeat until in the pop side, this
is why lock-free flqueue is experiencing a retrograde throuput.
But with lock-free Ringbuffer there is a retrograde throuput in both 
the pop() and push().
With my lock-free ParallelQueue there is no retrograde throuput
in the graphs, cause i am using lock striping and avoiding all those 
problems, also, my ParallelQueue is avoiding contention and i am 
using it inside my Threadpool engine.
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have to reason and be smart when we are dealing with 
those kind of lock-free algorithms
Finally, and to be more clear, i will say that :
In the following graph:
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
the retrograde throuput that you will notice in lock-free 
flqueue is not caused by the factor 'contention' , it's 
caused by the fact that when threads are executing 
beteween (1) and (2) (in the pop() side):
(1) 
 lastHead:=head[0]; 
 if tail[0]<>head[0] 
 then 
 begin 
 obj:=getObject(integer(lastHead)); 
(2) 
 if CAS(head[0],lasthead,lasthead+1 
if head have changed all the threads have to reexecute 
the loop and this loop is more expensive than the 
repeat-until in the pop() side, this is why lock-free 
flqueue is experiencing a retrograde throuput.
But in the push() side:
repeat 
 if CAS(longword(tail[0]),longword(lasttail),longword(newtemp)) 
 then 
 begin 
 //{$IFDEF MUTIPLE_PRODUCER} 
 // sleep(0); 
 //{$ENDIF} 
 exit; 
 end; 
sleep(0); 
until false; 
end; 
The loop in the push() side is not so expensive, and when 
the thread are executing between (1) (2) and tail 
have changed that doesn't mean that the threads have 
to reexecute the loop, they may succeed as i have 
explained before.
Also, you have to take into account the factor that we call 
CONTENTION , in medium and high contention both lock-free 
flqueue and lock-free RingBuffer will behave badly, but 
since my lock-free ParallelQueue is using lock striping
it is well suited for medium to high contention.
Regards, 
Amine Moulay Ramdane. 
http://pages.videotron.com/aminer 
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page