Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Clean up inside gate.h

AJ13
New Contributor I
821 Views

Well, I get to be the jerk to call out a piece of very confusing code inside TBB. This master-piece took a half hour to understand, and I'm still very confused as to what this function does. Inside function:

void try_update( intptr_t value, intptr_t comparand, bool flip=false )

There is this gem:

if( flip ? old!=comparand : old==comparand )

It took me a while to parse out how a ternary-operator was being used within an if statement. Once that was out of the way, I am still very confused as to why try_update takes a flip argument at all... the documentation says this:

//! Update state=value if state==comparand (flip==false) or state!=comparand (flip==true)

I am still very confused... so this function seems to be something that will flip around the logic... complement perhaps. Why can't this be split into two functions? Is there a really good performance reason for doing this? I don't want to clean up code inside TBB that is like that for a very good reason... so other than possibly minimizing the size of the instruction cache, why is the function this way?

Thanks,

AJ

0 Kudos
26 Replies
RafSchietekat
Valued Contributor III
677 Views

"Why can't this be split into two functions?" It took me a half hour to understand why flip might be so confusing... no, I still don't get it. :-)

0 Kudos
AJ13
New Contributor I
677 Views

Because it looks like try_update is trying to do too much... the functionality depends upon a last parameter, and I just don't see a real reason to do that. In addition, that last parameter has a default argument set. For someone reading it for the first time, it's just confusing to try to understand this strange function. The function isn't used in many places, so I think a better name and perhaps splitting the function can be done without any harm.

0 Kudos
Alexey-Kukanov
Employee
677 Views
Quoting - AJ
I am still very confused... so this function seems to be something that will flip around the logic... complement perhaps. Why can't this be split into two functions? Is there a really good performance reason for doing this? I don't want to clean up code inside TBB that is like that for a very good reason... so other than possibly minimizing the size of the instruction cache, why is the function this way?

It could be split of course; but if all code except one condition is exactly the same for two cases, why would we duplicate this code, andconduse ourselves with two names for essentially the same thing?

0 Kudos
AJ13
New Contributor I
677 Views

No reason other than to make it easier to trace through the code for people (like me) not familiar with it.

0 Kudos
robert-reed
Valued Contributor II
677 Views
Quoting - AJ

No reason other than to make it easier to trace through the code for people (like me) not familiar with it.

I guess you're familiar with it now? ;-)

0 Kudos
Dmitry_Vyukov
Valued Contributor I
677 Views
Quoting - AJ

There is this gem:

if( flip ? old!=comparand : old==comparand )

Is this better (note - no ternary-operator)?

if(flip^old==comparand)

0 Kudos
robert-reed
Valued Contributor II
677 Views
Quoting - Dmitriy Vyukov

Is this better (note - no ternary-operator)?

if(flip^old==comparand)

Good one, Dmitriy! No matter how obscure, there's always a more obscure way to do most things.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
677 Views

I think there are better ways to spend time in gate.h. For example:

- add fenced version of get_state() with preceding store-load fence

- move it to public interface

- reuse it in queue as wait logic

Also probably:

- make it fine-grained to prevent thundering herd in queue (which is currently there). Here is description and algo:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/c9c1a404ca626e4a

or

- make it portable general-purpose low-overhead gate/eventcount/condition_variable. Here is algo:

http://groups.google.com/group/comp.programming.threads/msg/16987ca1f7074394

0 Kudos
AJ13
New Contributor I
677 Views
Quoting - Dmitriy Vyukov

I think there are better ways to spend time in gate.h. For example:

- add fenced version of get_state() with preceding store-load fence

- move it to public interface

- reuse it in queue as wait logic

Also probably:

- make it fine-grained to prevent thundering herd in queue (which is currently there). Here is description and algo:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/c9c1a404ca626e4a

or

- make it portable general-purpose low-overhead gate/eventcount/condition_variable. Here is algo:

http://groups.google.com/group/comp.programming.threads/msg/16987ca1f7074394

I don't know how to work on any of those issues, but I do know how to clean up unclear code. I'll put this on my TODO list, to make a quick patch for this area of gate.h.

Code review is an important part of open source software, unless there is a good reason not to adjust the code and interface. Many open source projects have their commits posted to a mailing list for review, to ensure that what enters the repository is clear, and peer-reviewed.

0 Kudos
RafSchietekat
Valued Contributor III
677 Views

"I do know how to clean up unclear code" That's why yesterday, in another project, I introduced some instances of "if(A?B:C)" into existing code, because I dislike redundancy even more than ternary operators in predicates. :-)

0 Kudos
robert_jay_gould
Beginner
677 Views
Quoting - Raf Schietekat

"I do know how to clean up unclear code" That's why yesterday, in another project, I introduced some instances of "if(A?B:C)" into existing code, because I dislike redundancy even more than ternary operators in predicates. :-)

I agree that ternary operators are nice to keep code simple and clean, not obscure at all for myself. But then again ternary operators might be loosing popularity with younger programmers, who knows? One day ternary operators might get the GOTO treatment, and be considered stuff best left out, unless utterlynecessary.

0 Kudos
AJ13
New Contributor I
677 Views

It's not the ternary operator, it was the ternary operator within an if statement. It was the first time I have ever seen this usage. If it's an accepted pattern, that's fine, I just have never seen it and it seemed overly cryptic.

It also seemed to me that this function is trying to do too much... the behaviour of the function changes with an argument.

Now, it could be that I just don't understand exactly what this function is trying to do, and some better documentation would make it a lot clearer for me. However, separating this function into something like try_update_if_equal, and try_update_if_different or something like swap_if_equal(), swap_if_not_equal() would be easier to read and understand.

0 Kudos
RafSchietekat
Valued Contributor III
677 Views

"the behaviour of the function changes with an argument"

I generally like it when functions don't ignore what I tell them to do. :-)

0 Kudos
Alexey-Kukanov
Employee
677 Views
Remember, the stuff was initially developed as strictly internal, so it was enough a few TBB scheduler mavens understand it :). It's good that itsunclearnesswas caught by your eyes - a chance to improve. I agree with you that better documentation inside the code would be helpful for everyone who like you tries to understand what's going on there. I also agree that better naming is possible, both for the function and its arguments. Still, duplicating the code just for sake of function naming is redundant, and our internal peerreview would I believe say no-go to any such proposal.

0 Kudos
ARCH_R_Intel
Employee
677 Views
I'm with Dmitriy - time is best spent fixing the overall algorithm (blockingthreads until there is work to do). I will study Dmitriy's references in more detail, as I'm alreadytrying to redo the algorithm.

0 Kudos
AJ13
New Contributor I
677 Views
Remember, the stuff was initially developed as strictly internal, so it was enough a few TBB scheduler mavens understand it :). It's good that itsunclearnesswas caught by your eyes - a chance to improve. I agree with you that better documentation inside the code would be helpful for everyone who like you tries to understand what's going on there. I also agree that better naming is possible, both for the function and its arguments. Still, duplicating the code just for sake of function naming is redundant, and our internal peerreview would I believe say no-go to any such proposal.

Could somone spend a moment, and provide a clear explanation of the purpose of this function?

Also, I will be going through a lot more of the messy details of the scheduler. With everyone's help, I'd like to write a bit of a guide to help others understand its internals.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
677 Views
Quoting - AJ

Could somone spend a moment, and provide a clear explanation of the purpose of this function?

It's really hard to explain the purpose of this single function, because it's a part of entire hardcore lock-free blocking/signaling algorithm.

Condition variables are usually used for blocking/signalling in lock-based algorithms. But condvars are not suitable for lock-free algorithms, because they require a mutex and thus eliminating all benefits of lock-freedom.
Eventcounts/gates are basically "condition variables for lock-free algorithms". The main point is that they make fast-path non-blocking (and preferably atomic-free) (slow-path is still blocking because it's irrationally to have non-blocking blocking algorithm).

I think you can start studying from this excellent Chris Thomasson's eventcount algorithm:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380
(see Joe Seigh's comments about memory fences and usage example)

I think Chris Thomasson can say more about eventcounts/gates and provide more links.

Btw, I was thinking about writing an article for ISN about eventcounts and their usage in conjunction with lock-free containers (queues, stacks, etc). Because currently my queue is only half-useful:
http://software.intel.com/en-us/articles/single-producer-single-consumer-queue

0 Kudos
Dmitry_Vyukov
Valued Contributor I
677 Views
I'm with Dmitriy - time is best spent fixing the overall algorithm (blockingthreads until there is work to do). I will study Dmitriy's references in more detail, as I'm alreadytrying to redo the algorithm.

I can describe what interface for eventcount I am currently using:

[cpp]class eventcount
{
public:
    typedef uint32_t state_t;

    eventcount();
    ~eventcount();

    state_t prepare_wait();
    void retire_wait();
    void wait(state_t cmp);

    void signal();
    void signal_relaxed();
};



//Here is a little helper class for blocking threads:

class eventcount_blocking
{
public:
    eventcount_blocking(eventcount& ec)
        : ec_(ec)
    {
        cmp_ = ec_.prepare_wait();
        wait_ = false;
    }

    void wait()
    {
        assert(false == wait_);
        wait_ = true;
        ec_.wait(cmp_);
    }

    ~eventcount_blocking()
    {
        if (false == wait_)
            ec_.retire_wait();
    }

private:
    eventcount& ec_;
    eventcount::state_t cmp_;
    bool wait_;

    eventcount_blocking(eventcount_blocking const&);
    eventcount_blocking& operator = (eventcount_blocking const&);
};[/cpp]

And here is usage example:

[cpp]queue q;
eventcount ec;

void produce(void* data)
{
    q.enqueue(data);
    ec.signal(); // or signal_relaxed() depending on queue implementation
    // overhead for signaling is: [MFENCE] + load + branching
}

void* consume()
{
    void* data = 0;
    if (data = q.dequeue())
        // fast-path (no overhead)
        return data;
    for (;;)
    {
        eventcount_blocking block (ec);
        if (data = q.dequeue())
            return data;
        block.wait();
        if (data = q.dequeue())
            return data;
    }
}[/cpp]


The main point is that (1) eventcount logic is completely non-intrusive wrt lock-free container and (2) it's possible to aggregate several containers with single eventcount:

[cpp]void* consume_impl()
{
    return high_prio_queue.dequeue()
      || normal_prio_queue.dequeue()
      || work_stealing_dequeue.pop()
      || low_prio_queue.dequeue()
      || global_root_task_queue.dequeue();
}

void* consume()
{
    void* data = 0;
    if (data = consume_impl())
        // fast-path (no overhead)
        return data;
    for (;;)
    {
        eventcount_blocking block (ec);
        if (data = consume_impl())
            return data;
        block.wait();
        if (data = consume_impl())
            return data;
    }
}[/cpp]

Impressive! IMHO, it's extremely flexible and powerful.

P.s. I'm not aware of legal details, but if you need I can submit some of my algorithms as official TBB contributions as hash map.

0 Kudos
Alexey-Kukanov
Employee
677 Views
Quoting - AJ
Could somone spend a moment, and provide a clear explanation of the purpose of this function?

Also, I will be going through a lot more of the messy details of the scheduler. With everyone's help, I'd like to write a bit of a guide to help others understand its internals.

Wanna become one of those scheduler mavens? :):)

The gate is basically a state machine. The states are described in task.cpp as three constants: SNAPSHOT_EMPTY means there is no tasks available for stealing, SNAPSHOT_FULL means there is at least one task (yes, you might find the name misleading :)), and SNAPSHOT_PERMANENTLY_OPEN means all masters leave, there will be no more work and workers should just exit. There is the fourth state, BUSY, which means some thread is checking all task pools to find if there are some tasks; this state does not have a permanent constant for it but instead is represented by a unique ID of the thread doing the exercise.

Between states, some transitions are possible and some are not. Transitions are performed by the function called try_update; as you might guess from its name, it does not guarantee a new state is set because some other thread could change the current state. The semantics of try_update is almost the same as for compare-and-swap: it changes the state of the gate to Value (the first argument) if it is currently equal to Comparand... or if it isnot equal, in some cases, depending on which transitions to the new state are possible. To distinguish inside try_update what logic is required for a particular operation, the third argument (flip)is used. If flip is false (by default), the regular test-and-set logic is used. If flip is true (which should be explicitly specified), the opposite logic is used, i.e. the current value is changed iff it is not equal to Comparand.

Let's see some examples.

[cpp]inline void GenericScheduler::mark_pool_full() {
    // ...
    Gate::state_t snapshot = arena->prefix().gate.get_state();
    if( snapshot!=SNAPSHOT_FULL ) 
        arena->prefix().gate.try_update( SNAPSHOT_FULL, SNAPSHOT_PERMANENTLY_OPEN, true );
}
[/cpp]

As you might guess from its name, mark_pool_full signals that some task pool is "full", i.e. there is some job. Transition to FULL is possible from EMPTY and BUSY states, but not from PERMANENTLY_OPEN (which can not change to anything else, by the way). Thus we set the third argument instructing try_update to only make the change if the state is not PERMANENTLY_OPEN, i.e. using the "flipped" logic.

Now let's look at the piece of code from wait_while_pool_is_empty:

[cpp]            case SNAPSHOT_FULL: {
                // Use unique id for "busy" in order to avoid ABA problems.
                const Gate::state_t busy = Gate::state_t(this);
                // Request permission to take snapshot
                arena->prefix().gate.try_update( busy, SNAPSHOT_FULL );
[/cpp]

Currently, the state is FULL (as seen in the very first line); but a worker could not steal a task for a while, and so wants to check whether tasks to steal really exist, and refresh the state. To do that, it first tries to set the state to BUSY; and the change is only allowed if the state is still FULL, because otherwise either some other thread already does/have done the global check, or all masters quit and the gate is permanently open. Therefore "normal" test-and-set logic is used.

Hope it will help you to understand this piece of the scheduler.

0 Kudos
ARCH_R_Intel
Employee
638 Views
If someone is planning to rewrite the entire snapshot/gate logic, here is what we want to accomplish in a scalable manner:
  1. A thread should sleep if there is no task in the system to work on.
  2. If a thread puts a task in its pool, and there are worker threads asleep, one of the workers should be woken up.

Your solution gets extra credit if when a task is mailed to another thread, (2) wakes up that thread. (1) may get more complicated if(1) takes into account the restrictions discussed for solving multiple pipeline deadlock.

You may gain or lose points for the number of ternary operators used, depending on who is grading.

0 Kudos
Reply