Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Putting task_proxy to the head of mailboxes

irisshinra308
Beginner
867 Views
Hi~

I am trying to add a new method to put the task_proxy to the head of a mailbox to make sure this task would be executed as soon as possible.
I have refered to the method push() in the class mail_outbox, and the following is my implementation:
[cpp]void push_top( task_proxy& t ){
  t.next_in_mailbox = my_first;
  __TBB_store_with_release(my_first, &t);
  task_proxy* l = acquire();
  if(l == NULL)
    __TBB_store_with_release(my_last, &t);
}[/cpp]

I am also wondering what the purpose of __TBB_store_with_release is.
It actually called the function __asm__ __volatile__("" : : : "memory" );.
I have googled this function and it says that this function is used to force the CPU access the memory instead of the register to maintain the access order.
However, I do not really understand why this is neccessary to access the "my_first" and "my_last".
Could anyone give me some help on this?

Sorry for so many questions!

Thanks for reading.

Dennis
0 Kudos
7 Replies
Dmitry_Vyukov
Valued Contributor I
867 Views
Quoting - irisshinra308
I am trying to add a new method to put the task_proxy to the head of a mailbox to make sure this task would be executed as soon as possible.


AFAIK, even if you will put the task to the head of a mailbox it will NOT be executed ASAP, because thread checks the mailbox only if it's main work-stealing deque is empty. So if work-stealing deque contains a plenty of work, worker thread will not check the mailbox in the near future at all.

Tasks are processed by TBB in LIFO order, so what you have to do is just to spawn your important task last, or just return it from task::execute(), then it will be executed right after current task.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
867 Views
Quoting - irisshinra308
I am also wondering what the purpose of __TBB_store_with_release is.
It actually called the function __asm__ __volatile__("" : : : "memory" );.
I have googled this function and it says that this function is used to force the CPU access the memory instead of the register to maintain the access order.
However, I do not really understand why this is neccessary to access the "my_first" and "my_last".
Could anyone give me some help on this?



The abstract platform-independent purpose of __TBB_store_with_release() is four-fold:
1. Ensure actual memory access
Assume that thread stores new values to my_first and my_last, however they gets cached in registers of the processor and not go to the memory subsystem. Mailbox owner will just not see those new values, because they are in registers of another processor only.

2. Ensure coherence between processors
New stored values must spread across all interested processors. Otherwise we can get the same effect.

3. Ensure correct ordering of memory accesses on compiler level
4. Ensure correct ordering of memory accesses on hardware level
Thread have to store new values to my_first and my_last only after it has stored all the data to the task itself. Otherwise mailbox owner may see just some trash stored in the task object. Such undesirable reordering of memory accesses may be done by compiler and/or hardware.

However if we consider these aspects regarding x86 only, then:
1. Ensured by usage of 'volatile' keyword.
2. Ensured by x86 hardware automatically.
3. Ensured by usage of __asm__ __volatile__("" : : : "memory" ).
4. Ensured by x86 hardware automatically.


0 Kudos
irisshinra308
Beginner
867 Views
Quoting - Dmitriy Vyukov

AFAIK, even if you will put the task to the head of a mailbox it will NOT be executed ASAP, because thread checks the mailbox only if it's main work-stealing deque is empty. So if work-stealing deque contains a plenty of work, worker thread will not check the mailbox in the near future at all.

Tasks are processed by TBB in LIFO order, so what you have to do is just to spawn your important task last, or just return it from task::execute(), then it will be executed right after current task.


Thanks for the reply, especially the clarification of the __TBB_store_with_release() method
It REALLY helps me a lot!

I am working on the task scheduling issue in TBB, trying to achieve the better performance by altering the affinity tasks execution sequence.
In order to dealing with the affinity continuation tasks, I wish to put these task in front of the mailbox to make them executed ASAP.

However, the push_top() method I have implemented could not work correctly.
It seems that it could lead to a deadlock between all threads when every thread trying to put there tasks into the front.

By the way, the two methods, __TBB_store_with_release and __TBB_load_with_acquire, utilize the__asm__ __volatile__("" : : : "memory" ); to maintain the correctness of applications instead of utilizing the mutex.

Although there are "release" and "acquire" in the method names, it would never lead to a deadlock like mutex.
Am I right?

Thanks again!

Dennis

0 Kudos
Dmitry_Vyukov
Valued Contributor I
867 Views
Quoting - irisshinra308
Although there are "release" and "acquire" in the method names, it would never lead to a deadlock like mutex.
Am I right?

Well, it depends. The operations itself can't lead to a deadlock. However you may mimic two mutexes with these loads and stores, and so get the same deadlock. Or you may spin on a load and thus get livelock.

0 Kudos
irisshinra308
Beginner
867 Views
The following is my new implementation of push_top method.
However, it could still lead to a segmentation fault sometimes.
Could anyone tell me where the problem is?

By the way, I know the purpose of __TBB_store_with_release and __TBB_load_with_acquire, but when should I use these function?

Thanks
[cpp]void push_top( task_proxy& t ){
  task_proxy* l = acquire();
  t.next_in_mailbox = my_first;
  __TBB_store_with_release(my_first, &t);
  if(l == NULL)
    l = &t;
  __TBB_store_with_release(my_last, l);
}


[/cpp]
0 Kudos
Dmitry_Vyukov
Valued Contributor I
867 Views
Quoting - irisshinra308
The following is my new implementation of push_top method.
However, it could still lead to a segmentation fault sometimes.
Could anyone tell me where the problem is?

If you take it seriously, you may use Relacy Race Detector to verify the algorithm. It will show you what exactly, when and how goes wrong.
http://groups.google.com/group/relacy

0 Kudos
Dmitry_Vyukov
Valued Contributor I
867 Views
Quoting - irisshinra308
By the way, I know the purpose of __TBB_store_with_release and __TBB_load_with_acquire, but when should I use these function?

You must use these functions when correctness of your algorithm depends on particular ordering of memory accesses. I.e. if some reorderings of memory accesses introduced by compiler and/or hardware may lead to errors in your algorithm, you must use these functions to prevent the reorderings. Consider following example:

int data = 0;
int flag = 0;

// thread1
data = 17;
flag = 1;

// thread2
if (flag)
assert(data == 17);

It's possible that execution of thread1 will be perceived by thread2 as:
flag = 1;
data = 17;

and/or execution of thread2 will be perceived by thread1 as:
int tmp = data;
if (flag)
assert(tmp == 17);

Both of these reorderings will kill the algorithm to death.
So, in order to ensure correctness the algorithm must be modified as (C++0x syntax):
int data = 0;
std::atomic flag = 0;

// thread1
data = 17;
flag.store(1, std::memory_ordering_release);

// thread2
if (flag.load(std::memory_order_acquire))
assert(data == 17);

Now undesirable reorderings are supressed, i.e. store to 'data' will necessarily happen before store to 'flag', and load of 'flag' will necessarily happen before load of 'data'.


0 Kudos
Reply