- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to specify task priority in OpenMP. This is not available in V17.0.1 (nor mentioned in release notes for V17.0.4).
My intended use if for an MPI with OpenMP application where on rank 0, a spawned task (or master thread prior to first task) runs at elevated task priority. This task manages a work queue for task issued to rank 0, as well as issuing task requests to other ranks via MPI messaging.
What I want to accomplish is to have the task manager task .NOT. participate in tasks that it enqueues. Should it do so (which I expect it is doing so now) then it will introduce an undesired latency in servicing the tasks to be issued to the additional MPI ranks (as well as to itself).
Ideally it would be nice to have
#pragma omp task deferred
Where the task is enqueued, but not run by the enqueuing task, except when enqueuing task issues taskwait.
This feature would not require implementation of task priority.
BTW my code restricts number of pending tasks thus would not have too many pending deferred tasks.
Jim Dempsey
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey,
Thanks for response.... but you misunderstood the question.
The priority I am talking about is not the thread priority, rather it is the task priority. These are not the same. Let me explain further...
#pragma omp task
{
doWork();
In the above, doWork() can be a deferred task run by some other thread .OR. a direct task by the enqueuing thread (at discretion of implementation). What I want to assure is for the enqueuing thread to .NOT. execute the enqueued task (force task to be deferred).
Consider:
#pragma omp parallel
{
#pragma omp master
{
#pragma omp task priority(1)
{
for(;!Done;)
{
... // some code
#pragma omp task // priority(0)
{
doWork(); // not performed by priority(1) at enqueuing time
}
... // other code
} // for(;!Done;)
#pragma omp taskwait // priority(1) permitted to doWork() here
} // omp task priority(1)
#pragma omp taskwait
} // omp master
} // omp parallel
I hope I entered that correctly
The goal is for the for(;!Done;) loop to .NOT. take a detour into doWork() during execution of the loop. However, upon Done, subsequent (inner most) taskwait would permit the priority(1) task to execute any pending doWork() tasks.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reason for the above is the "some code" and "other code" are managing a job queue where jobs are distributed to the ranks of an MPI application (including itself as rank 0).
Without the task priority, the for(;!Done;) loop can detour into doWork(), and thus induce a response latency to the MPI (and rank 0 task processing for(;!Done;)), latency time == runtime of specific doWork(). I do not want this intermittent latency (can be on the order of 60 seconds).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional information that may be of use to the readers of this thread.
In my work to this problem, I've discovered a compiler optimization issue with respect to OpenMP tasking. IMHO this is a bug. The failing code is as follows (simplified):
#pragma omp parallel { #pragma omp master { // negative job number indicates no more jobs const int jobListDone = -1; const int jobNotAvailable = 888888; // some positive number larger than highest possible job number (typically less than 500) atomic<int> jobForRank0 = jobNotAvailable; // job queue management task #pragma omp task { for(int jobIndex = 0; jobIndex < jobQueue.size(); ++jobIndex) { int jobNumber = jobIndex; // simplification of code for(;;) { for(int rank=0; rank < nRanks; ++rank) { if(availableProcessingResources(rank)) { // found a rank with sufficient resources if(rank == 0) { // special case for rank 0 (self) if(jobForRank0 == jobNotAvailable) { jobForRank0 = jobNumber; jobNumber = jobNotAvailable; // indicate job dispatched break; } } else { // rank > 0 DispatchJobToMPIRank(rank, jobNumber); jobNumber = jobNotAvailable; // indicate job dispatched break; } } // if(availableProcessingResources(rank)) } // for(int rank=0; rank < nRanks; ++rank) if(jobNumber == jobNotAvailable) break; this_thread(std::chrono::milliseconds(100)); // wait a bit } // for(;;) } // for(int jobIndex = 0; jobIndex < jobQueue.size(); ++jobIndex) // empty job list // wait for rank 0 job processing task to consume remaining job number (if any) for(;jobForRank0 != jobNotAvailable; ) this_thread(std::chrono::milliseconds(100)); // wait a bit jobForRank0 = jobListDone; // inform rank 0 processing task to exit } // #pragma omp task // rank 0 processing task #pragma omp task { for( ;jobForRank0 != jobListDone; ) { if(jobForRank0 == jobNotAvailable) { this_thread(std::chrono::milliseconds(100)); // wait a bit } else if(jobForRank0 >= 0) { int jobNumber = jobForRank0; jobForRank0 = jobNotAvailable; #pragma omp task firstprivate(jobNumber) { doWork(jobNumber); } } } // for( ;jobForRank0 != jobListDone; ) #pragma omp taskwait } // omp task #pragma omp taskwait } // omp master } // omp parallel
What is happening, and I am of the opinion this is a bug, Is the second task has a loop that appears to have loop invariant code. From examination in the debugger, the #pragma omp task for the second task captured jobForRank0 as a local copy ... even though this is an atomic<int> (it also captured the const nobNotAvailable) the captured values were both 0, though this may be an artifact of registerized variables, in any event, the atomic<int> jobForRank0 should not have been registerized or captured.
The solution was to use
#pragma omp task shared(jobForRank0)
That variable should have been shared by default. My guess was the optimizer not seeing an explicit shared (and seeing no change within task) took the liberty to make the value captured.
I hope this helps others with similar issues.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page