- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Definitely. Batch enqueue is a useful feature in some situations.
AFAICT, TBB queue does not support it currently. I think that TBB queue can be patched to support batch enqueue.
However real advantage of batch enqueue can be revealed in node-based queues, where a thread can prepare a chain of linked nodes, and pass pointer to head and tail of the chain to a queue. I think it's quite easy to modify my node-based queue algorithm this way:
http://groups.google.com/group/lock-free/browse_frm/thread/c8e3201da4a6a300
Ah, OK, I can do it right now:
[cpp]void enqueue(node_t* head, node_t* tail)So, 100 or 1000 nodes enqueued with a single InterlockedExchange. Try to beat it! ;)
{
tail->next_ = 0;
node_t** prev = (node_t**)
_InterlockedExchange((long*)&tail_, (long)tail);
prev[0] = head;
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Batch *enqueue* (not batch queue). Well, by batch enqueue I just mean a situation when you pass a batch of nodes to a queue for enqeueue, and the queue somehow optimizes it as compared to a series of single-node enqueues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tbb queue is array based. This means that in order to insert a batch of N elements, you need to do N object copies and hold the mutex all that time. O(N). Not very cool for a batch insert.
That my queue is node based. This means that in order to insert a batch of N elements, you need to update 2 pointers using 1 atomic exchange and 1 plain store. O(1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>That my queue is node based. This means that in order to insert a batch of N elements, you need to >update 2 pointers using 1 atomic exchange and 1 plain store. O(1).
But 'before' inserting the batch ,don't you need to organize itin a linklist form ?
And if you organize it in a link list form - and insert many nodes -,
that still will take O(N) insert... no ?
>[...] and hold the mutex all that time
That's true.
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>Tbb queue is array based...[...]
>That my queue is node based.
>and hold the mutex all that time
Not all the time, you can insert them one by one in my
lockfreeParallelQueue or lockfree_mpmc...
But innode basedqueue, you still have to access before the FREELIST or the
MEMORY MANAGERand you have to use aCASfor that, no ? and that will
add to the 'contention' also, so it's the asmeas the array based, think about it more !
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But innode basedqueue, you still have to access before the FREELIST or the
MEMORY MANAGERand you have to use aCASfor that, no ? and that will
add to the 'contention' also, so it's the sameas the array based, think about it more !
Sincerely,
Amine Moulay Ramdane
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I know that you are smart Dmitry...
So follow with me please...
If you win5$with theleft hand , and you lose 5$ withthe righthand ...
You have to see the big picture to be able to judge, not just the left hand...
So read carefuly.
But innode basedqueue, you still have to access before the FREELIST or the
MEMORY MANAGERand you have to use aCASfor that, no ? and that will
add to the 'CONTENTION' also, so it's the sameas the array based, think about it more !
So the left hand here is the node based queue , and the right hand is the FREELIST or
the MEMORY MANAGER.
So, to be able to judge you have to seejudge from the BIG picture not just the left hand...
>That my queue is node based. This means that in order to insert a batch of N elements, you need to >update 2 pointers using 1 atomic exchange and 1 plain store. O(1).
But 'before' inserting the batch ,don't you need to organize itin a linklist form ?
And if you organize it in a link list form - and insert many nodes -,
that still will take O(N) insert... no ?
Sincerely,
Amine Moulay Ramdane
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I know that you are smart Dmitry...
As i said , if you win5$with theleft hand , and you lose 5$ withthe righthand,
You have to see the big picture to be able to judge, not just the left hand...
So if you take a look for example at my lock-free ParallelQueue...
Youwill notice thati havemade the following affirmation:
"Because my lock-free ParallelQueue algorithm is using a hash based method and lock striping - with multiple lock-free queues working as one lock-free queue, see the source code - to reduce more the contention and is using just a LockedInc() , so, i am REDUCING the duration for which locks are held AND REDUCING the frequency with which locks are requested, hence i am REDUCING A LOT the contention, so it's very efficient.
So, my ParallelQueue algorithm is lock-free and is using lock striping
and also avoids false sharing, it's why it scored 17 millions of pop() transactions
per second (see the graphs bellow)... better than flqueue and RingBuffer [...]."
But there is still an important information that i forgot to put on the page
(to be able to see the big picture not just the left hand !),
If you take a look at lockfree_mpmc , it's using a
LockedIncLong(temp) +CAS(tail[0],lasttail,newtemp) on the push() side..
But if you take a look at my lock-free Parallelqueue, i amonly
a single lockinc()
etc.
Welcome:
http://pages.videotron.com/aminer/parallelqueue/parallelqueue.htm
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
As you have noticed Dmitry, my Thread Pool engine is also using my lock-free
ParallelQueue to be able to SCALE better in the future...
http://pages.videotron.com/aminer/threadpool.htm
And as you have noticed , withmy Object Pascal Thread Pool Engine
i have implemented my Parallel Compression Library , my Parallel Sort Library etc.
Give me a smaller and efficientthing and i will design a better and bigger thing !
that was my idea !
It's amazing to allthe things that we can dowith just this efficient Object Pascal
Thread Pool engine that i have designed !
Not too bad! right Dmitry ?
Welcome:
http://pages.videotron.com/aminer/
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>That's true.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If you ask me a question like:
"Your Thread Pool Engine and lockfree ParallelQueue and lockfree_mpmc,
are working with Windows, Linux , Solaris and MacOSX on x86 , why
not port them also to Sparc Solaris station also ?"
My answer:
That's very easy to port my Thread Pool Engine , lockfree ParallelQueue
and lockfree_mpmc to Sparc Solaris multicores station etc and i will do
it in the near future...
and FreePascal 2.4.2 is already here:
http://www.freepascal.org/download.var
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>Actually i am more concerned about this.
>- for a batch based insert Its actually O
>the batch insert get inserted one by one.
>we might get a fair picture.
You still didn't get the BIG picture Gokul..
In the lock-free Array based queue there is *NO* CONTENTION on the locks
insidea freelist ora memory manager.
In the lock-free node-based one , there is a contention on the locks inside
the freelistor the memory manager...
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MEMORY MANAGERand you have to use aCASfor that, no ? and that will
add to the 'CONTENTION' also, so it's the sameas the array based, think about it more !
I'm not insane to use centralized memory manager in a parallel program (most likely I won't use a centralized multi-producer/multi-consumer queue too, but that's aside for a moment). So of course memory comes from a local cache. No contention here.
> And if you organize it in a link list form - and insert many nodes -
that still will take O(N) insert... no ?
It's local operation conducted not under mutex. Moreover it should be combined with node filling, so it's basically zero overhead.
> Because my lock-free ParallelQueue algorithm is using a hash basedmethod and lock striping
Well, you can use it with my queue too... However the better design would be to not use a centralized queue in the first place.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitriy wrote
>> Because my lock-free ParallelQueue algorithm is using a hash basedmethod and lock striping
>Well, you can use it with my queue too...
Right andkeep up the good work Dmitriy...
Sincerely,
Amine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Because my lock-free ParallelQueue algorithm is using a hash basedmethod and lock striping
>>Well, you can use it with my queue too...
>Right andkeep up the good work Dmitriy..
I will add this Dmitriy..
Even if my lock-free ParallelQueue - that i am using inside my Object Pascal Thread Pool
Engine - doesn't scale beyong 100 cores or 200 cores ... you can still run many process
applications or threads in parallel that will use my Thread Pool Engine(s)and every process
application or threads will scale to 100 cores. Example my a Parallel Threadthat usemy
Parallel SortLibrary will scale to 100 cores and another thread or process that uses my
Parallel CompressionLibrary will scale to 100 cores etc. etc. so that yourthousand(s) cores
system will still besmartly UTILIZED, that's still fun ! rigth Dmitriy ?
Sincerely,
Amine Moulay Ramdane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
applications or threads in parallel that will use my Thread Pool Engine(s)and every process
application or threads will scale to 100 cores.
That's pretty cool!
Unfortunately, I can't say the same about my code, I don't have a 100-core machine for tests...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page