topic Parallel Compression 1.0 ..... in Intel® Moderncode for Parallel Architectures

Parallel Compression 1.0 .....

aminer10 — Fri, 02 Apr 2010 07:17:44 GMT

Skybuck wrote:
> What if people wanna roll there own versions ? ;)
> They would much better be "served" by algorithms/pseudo
> code than real code which could be system/language specific ;)

It's easy to EXTRACT algorithms from Object Pascal code...

Look for example inside pbzip.pas, i am using this in the
main body of my program:

name:='msvcr100.dll';

It's the 'test' file that i am using - it's inside the
zip file also - one you compile and execute pbzip.pas it
will generate a file msvcr100.dll.bz. And as you have
noticed i am using a - portable - compound filesystem,
look at ParallelStructuredStorage.pas inside the zip file.

After that i am opening it with:

fstream1:=TFileStream.create(name, fmOpenReadWrite);

and i am reading chunks of streams and 'distributing' them
to my Thread Pool Engine to be compressed - in parallel -
by myobj.BZipcompress method, look at:

for i:=0 to e
do
begin

if (i=e) and (r=0) then break;
stream1:=TMemoryStream.create;
if (r > 0) and (i=e)
then stream1.copyfrom(fstream1,r)
else stream1.copyfrom(fstream1,d);
stream1.position:=0;
obj:=TJob.create;
obj.stream:=stream1;
obj.directory:=directory;
obj.compressionlevel:=9;
obj.streamindex:=inttostr(i);
obj.r:=r;
obj.number:=e;

TP.execute(myobj.BZipcompress,pointer(obj));

end;

I am doing the same thing in PZlib.pas...

http://pages.videotron.com/aminer/

Sincerely,
Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Fri, 02 Apr 2010 07:28:28 GMT

Hello again,

And after that i am reading those compressedfiles
from the compound filesystem -look inside pzlib.pas -
and i am'distributing' those compressed files, as streams,
tomy Thread Pool Engine to be decompressed - look inside
pzlib.pas - by myobj.Zlibdecompress method, look at:

--------------------------------------------

names:=TStringlIST.create;

storage.foldernames('/',names);

len:=strtoint(names[0]);

if r=0 then len:=len+ 1 ;

for i:=0 to len

begin

if (i=len) and (r=0) then break;

obj:=TJob.create;

obj.directory:=directory;

obj.streamindex:=inttostr(i);

obj.index:=i;

obj.number:=e;

obj.r:=r;

TP.execute(myobj.Zlibdecompress,pointer(obj));

end;

--------------------------------------------------

Sincerely,
Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Fri, 02 Apr 2010 08:43:44 GMT

I wrote:
> And as you have noticed i am using a portable
> compound filesystem, look at ParallelStructuredStorage.pas
> inside the zip file.

Why ?

Cause you can parallel compress your files and store
those compound filesystem .zlb (zlib) or .bz (bzip)
compressed files in a portable compound filesystem
and after that you can distribute your compound filesystem...

And of course you can uncompress files - or all the
content of your compound file system - from your compound
file system.

And of course that's easy with Parallel Compression 1.0 :)

http://pages.videotron.com/aminer/

Sincerely,
Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Fri, 02 Apr 2010 15:11:18 GMT

Skybuvk wrote:
>[...] an algorithm really ;)
>What's so special about it ?

Parallel bzip and zlib is not just pbzip.pas and pzlib.pas
the parallel bzip and zlib algorithm include my Thread Pool Engine
algorithm + Parallel Queue algorithm ...

I am calling it algorithm cause it uses a finite number of
instructions and rules to resolve a problem - parallel compression
and decompression -

Do you understand ?

And as i said you can parallel compress your files and store
those compound filesystem .zlb (zlib) or .bz (bzip)
compressed files in a portable compound filesystem
and after that you can distribute your compound filesystem...
And of course you can uncompress files - or all the
content of your compound file system - from your compound
file system.

> I see a whole bunch of pascal/delphi files thrown together,
>a whole bunch of dll's and god-forbid ms*.dll files...

Those dlls are mandatory for now...

and you can easily write a batch file etc. and reorganize ...

> I see some "test programs" which are described as "modules" which they
> simply are not...

That's VERY easy to convert those pzlib.pas and pbzip.pas
to units, and that's what i will do in the next step...

Parallel Compression will still be enhanced in the future...

> It shouldn't be that hard... set your editor to "use tab character" (turn
> tabs to spaces off)

I am not using the delphi editor, just the notpad.exe or write.exe...
and i am compiling from the dos prompt...

>So far it seems like you are inserting your
>threads/syncronizations
>everywhere in single-thread-design algorithms ?

No, it's not just insertting threads/syncronizations ..

I have reasoned - and used logic - look for example at
parallelhashlist.pas inside the zip file, i am using MEWs etc.
carefully in the right places and i have also a little bit
modified the serial code... and it uses a hash based method ,
with an array of MREW...

The Thread Pool Engine Engine i have constructued it from zero
- and i have used my ParallelQueue - an efficent lock-free queue - etc....

The parallel bzip and zlib, i have constructed it by using
also my Thread Pool Engine construction etc...
etc.

That's not just 'inserting' threads/syncronizations.

>But my estimate would be that for now on low core systems... the
>"compression" would take far more time...

No. pbzlib.pas takes for example 3.3x on 4 cores...

http://pages.videotron.com/aminer/ParallelCompression/parallelbzip.htm

Skybuck also wrote
> [...] or anything extraordinary...

Don't be stupid Skybuck.

It's in fact:

1- Usefull
2 - A good thing for educational purpose.

Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Fri, 02 Apr 2010 18:00:58 GMT

Skybuck wrote:
>The thread pool concept is retarded.
>Any good delphi programmer is capable of creating an array of threads.
>So my advice to you:
>1. Delete your thread pool, because it's junk.
>2. Write a serious/big application that uses many threads,
>and simply derive from TThread to see how easy it is.

How can you be so stupid ?

My Thread Pool Engine is not just an array of threads,
it uses effient lock-free queues - example lock-free ParalleQueue -
for each worker thread and it uses work-stealing - for more
efficiency - etc ...

And it easy the work for you - you can 'reuse' the TThreadPool Class... -
and it is very useful...

Please read again:

http://pages.videotron.com/aminer/threadpool.htm

Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Sat, 03 Apr 2010 18:34:15 GMT

Skybuck wrote in alt.comp.lang.borland-delphi:

> My Thread Pool Engine is not just an array of threads,
> "
>
>> To me it is.

You really don't know what you are talking about..

The principal threat to scalability in concurrent applications
is the exclusive resource lock.

And there are three ways to reduce lock contention:

1- Reduce the duration for which locks are held

2- Reduce the frequency with which locks are requested

3- Replace exclusive locks with coordination mechanisms that
permit greater concurrency.

With low , moderate AND high contention, my ParallelQueue
offer better scalability - and i am using it inside my
Thread Pool Engine - .

Because my ParallelQueue is using an hash based method
- and lock striping - and using just a LockedInc() , so,
i am REDUCING the duration for which locks are held AND REDUCING
the frequency with which locks are requested, hence i am
REDUCING A LOT the contention, so it's very efficient.

And as I stated before , and this is a law or theorem to apply:

[3] If there is LESS contention THEN the algorithm will
scale better. Due to the fact that S (the serial part)
become smaller with less contention , and as N become bigger,
the result - the speed of the program/algorithm... - of the
Amdahl's equation 1/(S+(P/N)) become bigger.

It's why my ParallelQueue have scored 7 millions of pop()
transactions per second... better than flqueue and RingBuffer

look at: Http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm

Also my Threadpool uses efficent lock-free queues -
example lock-free ParallelQueue - for each worker thread
- to reduce an minimize the contention - and it uses work-stealing
so my Thread Pool Engine is very efficient...

And it easy the work for you - you can 'reuse' the TThreadPool
Class...- and it is very useful...

So, don't be stupid skybuck...

http://pages.videotron.com/aminer/

Sincerely
Amine Moulay Ramdane

Parallel Compression 1.0 .....

aminer10 — Sat, 03 Apr 2010 18:52:52 GMT

I wrote:
> Because my ParallelQueue is using an hash based method
> - and lock striping - and using just a LockedInc() , so,
> i am REDUCING the duration for which locks are held AND REDUCING
> the frequency with which locks are requested, hence i am
> REDUCING A LOT the contention, so it's very efficient.

With low , moderate AND high contention, my ParallelQueue
offers better scalability - and i am using it inside my
Thread Pool Engine - .

And as you have noticed, i am using a low to medium contention
on the following test:

http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm

But i predict that on HIGH contention the push() and pop() will
score even better than that..

Why ?

And as I stated before , and this is a law or theorem to apply:

Sincerely,
Amine Moulay Ramdane,

Parallel Compression 1.0 .....

aminer10 — Sat, 03 Apr 2010 21:26:58 GMT

Hello again,

Now as i have stated before:

[3] If there is LESS contention THEN the algorithm will

scale better. Due to the fact that S (the serial part)

become smaller with less contention , and as N become bigger,

the result - the speed of the program/algorithm... - of the

Amdahl's equation 1/(S+(P/N)) become bigger.

And , as you have noticed , i have followed this theorem [3] when
i have constructed my Thread Pool Engine etc...

Now there is another theorem that i can state like this:

[4] You have latency and bandwith , so,IF you use efficiently
one or both of them - latency and bandwidth - your algorithm
will be more efficient.

It is why you have to not start too many threadsin my
Thread Pool Engine, so that you will not context switch a lot,
cause, when you context switcha lot, the latency will grow and
this is not good for efficiency ..

You have to be smart.

And as i have stated before:

IF you follow and base your reasonning on those theorems
- or laws or true propositions or good patterns , like theorem [1], [2], [3],[4]... -
THEN your will construct a model that will be much more CORRECT
and EFFICIENT.

Take care...

Sincerely,
Amine Moulay Ramdane.

Parallel Compression 1.0 .....

aminer10 — Sun, 04 Apr 2010 06:29:38 GMT

Skybuck wrote:

>Some notes/pointers:

>1. The adventage of "blocking locks" on windows is that it doesn't consume
>so much cpu (?) when it's waiting... this leads to lower cpu temperatures.

>2. Nowadays I want to make my programs run as "cold" as possible to save the
>system from overheat death.

>3. Spinlocks make the cpu run hot and there is bad ?! Unless very maybe the
>blocking scenerio would be worse, but that needs to be proven first.

Read carefully:

"The worker threads enters in a wait state when there is
no job in the lock-free queues - for more efficiency -"

http://pages.videotron.com/aminer/threadpool.htm

So, my Thread Pool Engine doesn't consume any CPU when
there is no job in the queues , and this leads to lower
cpu temperatures.

Sincerely,
Amine Moulay Ramdane.