Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5142 Discussions

"Consider increasing task granularity or the scope of data synchronization" what it means?

mikeitexpert
New Contributor II
2,111 Views

I am familiarizing myself with VTune trying to reduce OpenMP overhead for my application. (if possible)

 

In part of the documentation here I could read the below resolution to reduce spin time and overhead time. 

"

A significant portion of CPU time is spent in synchronization or threading overhead. Consider increasing task granularity or the scope of data synchronization.

"

I just don't understand what "increasing task granularity or the scope of data synchronization." means here.

 

I would appreciate any further clarifications for that matter.

 

Thank you 

mikeitexpert_0-1624952077358.png

 

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
2,096 Views

What that means (suggests) is you appear to have sub-partitioned the work into too small of code sections for task dispatch. IOW consider increasing the amount of work each task performs.

Excessive spin time can be cause by: unbalanced work, too little work for parallel region/task, high arbitration for critical/atomic section.

Jim Dempsey

0 Kudos
Dmitry_P_Intel1
Employee
2,057 Views

Hello,

I would also interpret it as "consider increasing task granularity in the case of scheduler overhead or decrease the scope of data synchronization to avoid thread contention leading to spinning (active waits)". But of course it sounds a bit general. Can recommend to look at VTune Cookbook article to see some hints in action.

BTW - spinning in OpenMP worker threads also can be a result of waiting on serial code execution in the master thread. There is also a cookbook article on this. 

jimdempseyatthecove
Honored Contributor III
2,053 Views

>> spinning in OpenMP worker threads also can be a result of waiting on serial code execution in the master thread.

This spinning after the completion of a parallel region/loop is an optimization strategy to avoid additional overhead of making an O/S call to suspend or terminate non-master threads only to reestablish non-master threads immediately thereafter. The duration of the inter-parallel region spin-wait time can be controlled by the environment variable KMP_BLOCKTIME.

Jim Dempsey

jimdempseyatthecove
Honored Contributor III
2,053 Views

A task IMHO is that which runs between the scheduler start task and completion of the task.

A task may contain synchronization primitives such as mutex, where the mutex spin-waits in lieu of yield (e.g. at OpenMP critical section or OpenMP atomic section), and/or end of OpenMP loop/parallel region which contains an implicit synchronization primitive where as threads reach that synchronization point enter a spin-wait until all threads of the current thread team reach the point. Note, this spin-wait time (by each thread) runs in the context of the O/S scheduled thread task.

Jim Dempsey

mikeitexpert
New Contributor II
2,024 Views

>>> Note, this spin-wait time (by each thread) runs in the context of the O/S scheduled thread task.

Are you suggesting the spin-wait time is spent outside the thread task but inside OpenMP runtime?

 

mikeitexpert_0-1625338340235.png

 

Also, if you don't mind, I have a hard time following the be sentence.

"

VTune Profiler classifies the stack layers into user, system, and overhead layers and attributes the CPU time spent in system functions called by overhead functions to the overhead functions.

"

IOW, I can't follow what overhead functions are?

Can we say overhead time is the amount of time take by OpenMP runtime to initialize a parallel region? 

Thanks so much for links to the balance tuning articles. 

 

 

0 Kudos
AthiraM_Intel
Moderator
1,912 Views

Hi,

 

We are extremely sorry for the delay.

 

Please find the answers for your queries below:

 

>> Are you suggesting the spin-wait time is spent outside the thread task but inside OpenMP runtime?

 

Answer: The spin-wait time is measured as an overhead and is listed as part of OpenMP overhead. It is not the time spend in doing meaningful tasks but simply an overhead from OpenMP runtime. Large spin time could also results from load imbalance between threads. 

 

>> VTune Profiler classifies the stack layers into user, system, and overhead layers and attributes the CPU time spent in system functions called by overhead functions to the overhead functions.

I can't follow what overhead functions are?

 

 

Answer:  If the OpenMP runtime makes a system call, the time spent in the system call is also accounted as OpenMP overhead. 

 

>> Can we say overhead time is the amount of time take by OpenMP runtime to initialize a parallel region?

 

Answer: The overhead time includes everything from creating OpenMP threads, time taken to split the workload between threads , the synchronization between threads and the thread scheduling. Basically it includes initialize a parallel region and many other things as mentioned.

 

Hope this helps. If you have any further issue, please let us know.

 

Thanks

 

0 Kudos
AthiraM_Intel
Moderator
1,832 Views

Hi,

 

Could you please give us an update? Has the solution provided helped?

 

Thanks.

 

 

 

 

0 Kudos
AthiraM_Intel
Moderator
1,780 Views

Hi,


We assume that the solution provided helped. If you need any additional information, please submit a new question as this thread will no longer be monitored.


Thanks.


0 Kudos
Reply