Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Avx512 prediction on avx2

TimP
Honored Contributor III
480 Views
Kevin made a good webinar presentation today but it didn't shed any new light on how to run this avx512 prediction or whether it was enabled in the 2017 release.
0 Kudos
2 Replies
Kevin_O_Intel1
Employee
480 Views

 

Hi Tim,

Thanks for the kind words.

Yes. It is enabled.

In your Advisor project properties you need to click the toggle for "Analyze loops that reside in non-executed code paths"

Also, in your survey report right next to the "Vectorized" and "Not-Vectorized" filters you will see a button with a circular arrow.. click on this to display loops with zero time.

That is it from the Advisor sider. In the compiler switches you need to specify the multiple instruction sets .. for example §Use –axCOMMON-AVX512 –xAVX compiler flags to generate both code-paths

Let me know if you have any other questions.

Regards,

Kevin

 

0 Kudos
TimP
Honored Contributor III
480 Views

Kevin,

OK, this explanation of the Advisor settings helps.  On Windows, my typical compile:

$ ifort -O3 -Qopenmp -Qunroll:4 -QaxCOMMON-AVX512 -arch:AVX2 -assume:old_maxminloc,underscore -names:lowercase -fpp -Qopt-report:4 -debug:inline-debug-info loopdfv.F maind.obj forttime.obj

now makes the AVX5.. paths visible with their vector lengths but no assessment of speedup in Advisor GUI.  I see that vector lengths are typically listed as [16; 4; 8] on the lines where the expected speedup is listed. Vector length 16 (AVX512) in optrpt is accompanied by notation "unroll factor set to 2."

I don't get the Advisor Survey display filled in without the debug and opt-report settings.

In the optrpt there are frequently 3 reports per main loop, one labeled future_cpu30, one labeled generic, and one unlabeled (AVX2?).  The future_cpu30 generally has the same speedup rating as the AVX2.  It seems necessary to build separately for a single AVX512 path to get corresponding vector speedup quotations.

For short run times (with reduced sampling interval), it doesn't seem possible to get repeatably the same 6 to 8 top ranked loops.

Tim

0 Kudos
Reply