Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ananya_M_
Beginner
66 Views

Visualize parallel for-loop chunks.

Can VTune visualize execution of OpenMP for-loop chunks? For example, consider the parallel for-loop below:

#pragma omp parallel for schedule(dynamic, 10) num_threads(4)
for(int i=1; i<=100; i++)
{
    // do work resulting in 
    //  highly uneven per-iteration 
    //  execution time.
}

I want to visualize on which thread each of the 10 chunks (say (1,10),(11,20),...,(91,100)) executed and how long they took, without modifying code.

I understand that only four (one per thread) parallel outline functions are started, and that each of these functions ask for chunks in a synchronized manner. I can visualize the four parallel outline functions in VTune, but am unable to drill this visualization down to the chunk level.

0 Kudos
4 Replies
Dmitry_P_Intel1
Employee
66 Views

Hello Ananya,

As I see we cannot allow to filter/group to disctinct a particular chunk portion (in your case it is 10 iterations) w/o code modifications. What we can do - is show statistics per loop per thread - about all chunks that were executed by the thread in the loop.

What is the nature of the request to drill down to chunk portions rather than see aggregated statistics per thread or even loop?

Thanks & Regards, Dmitry

Ananya_M_
Beginner
66 Views

Hello Dmitry,

Thanks for your answer!

Chunk level statistics help in debugging parallel for-loops that perform uneven amount of work in every iteration.

For example, understanding which chunk executes for the longest time, and in which order on a thread, is useful to make for-loops more efficient, either by setting a good chunk size, or by re-writing the loop body more efficiently for particular iterations. Aggregate statistics can only indicate a load imbalance on the thread level and not pinpoint bad iterations.

I saw that you have a talk planned at OpenMPCon this year. I am looking forward to hear your talk!

Best regards, Ananya

Dmitry_P_Intel1
Employee
66 Views

Thank you, I see. In VTune we try to make trade off on statistics details and collection overhead. I would expect that such fine-grain chunk tracing can introduce overhead that potentially can spoil meaurement (we try to avoid per-thread instrumentation rather having per-region/barrier - imaging many-core processes where you can easily have hundreeds of threads).

From my experience problems with chunk size can be a result of scheduling overhead (and VTune will show this in OpenMP analysis) or less effective cache usage and here Memory analysis (VTune 2016 Gold) with grouping by OpenMP regions can help. We are also experimenting with an analysis type that can combine several aspects of performance - CPU utilization (incl. OpenMP parallelization efficiency), Memory usage efficiency and FPU utilization on one picture. If you are interested in such analysis I can invite you to early evaluation of this experimental feature.

BTW - unfortunately I'm not able to attend the conference but my colleague Sergey Vinogradov will present the materials and gateher/answer questions/feedback. Thank you for your interest!

Thanks & Regards, Dmitry

Ananya_M_
Beginner
66 Views

Hello Dmitry,

In VTune we try to make trade off on statistics details and collection overhead. I would expect that such fine-grain chunk tracing can introduce overhead that potentially can spoil meaurement (we try to avoid per-thread instrumentation rather having per-region/barrier - imaging many-core processes where you can easily have hundreeds of threads).

I understand the trade off between performance detail and collection overheads. Its a fine balance, easily upset. The idea of instrumenting per-region/barrier sounds sensible.

We are also experimenting with an analysis type that can combine several aspects of performance - CPU utilization (incl. OpenMP parallelization efficiency), Memory usage efficiency and FPU utilization on one picture. If you are interested in such analysis I can invite you to early evaluation of this experimental feature.

This sound very interesting. I would like to try this feature out. Please provide an invite. Thank you for offering.

Best regards, Ananya.

 

 

Reply