Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29342 Discussions

Performance data to performance insight?

croucar1
Beginner
721 Views

After a couple of decades without Fortran, I've been getting back up to speed to refactor and optimize some legacy F77 acoustic models.

I've read the books (Optimization Cookbook, VTune Essentials, Scientific Computing on Itanium, and the HP HPC book.) I have IVF 10 and VTune running, and now I'm drowning in metrics that I don't quite know the significance of to my application. Despite VTune's best intentions...

I've asked a couple of specific questions about the significance of the events and ratios I'm getting both here and in other forums, and of course I get the only rational answer: 'it depends on your application'.

Any ideas on resources I can use to understand the relevance of the many VTune events and ratios to my application? Other than the school of hard knocks...

Thanks,

Art

PS: I haven't done assembly language since the PDP-8 ;-)

0 Kudos
6 Replies
Steven_L_Intel1
Employee
721 Views
Art,

Welcome to the forum! Have you considered asking in the VTune forum section? You might get some good advice there.

In general, my advice is to first use VTune to see what part of your program is taking the most time so that you can focus your efforts. You can then look for events such as CPU stalls and cache misses to see if there's something major you can do such as reordering memory accesses, etc.

But seriously, the compiler is really pretty good about all this so if you enable the best optimization options - /O3 the proper /Qx switch for your CPU and perhaps /Qipo you can get far. PGO is another step you can try if you still need "a bit more". Turning on the optimization reports can help a lot too. You'll want to maximize the vectorization done by the compiler.

Also look at what multithreading can do for you on a modern system.

You should not have to touch assembly code anymore.
0 Kudos
TimP
Honored Contributor III
721 Views
The first task is to discover where the time is spent (clock tick events), so you know where to work on performance. If vectorization or OpenMP parallelization are relevant to your application, check what the compiler optimization reports have to say about the time-consuming parts. I prefer to start with gprof profiling (not supported by commercial Windows compilers), before tackling VTune or PTU.
Even among our expert colleagues, I get varying advice as to which events to collect, beyond the 2 defaults, and those associated with memory bandwidth saturation. Past efforts to build estimates into VTune of probable effects on performance haven't been very productive. There are about a dozen ratios typically considered interesting on Core 2, not necessarily all among the primary VTune suggestions. It is also entirely possible to have poorly tuned code which hits none of the event ratio hot buttons. There will be sections of codes with horrible event ratios, which you can ignore if not enough time is spent there to be worth further investigation.
Those texts cover most of the important events. The specific events you would be interested in change entirely between processor families, e.g. from Pentium D to Core 2 Duo.
Over 15 years, I have found the D-TLB (data translation look-aside buffer) events worth more attention than is given them in many of the textbooks. "Core" processors have D-TLB associated with each level of cache, and both are important. Perhaps they are ignored because the only simple advice is to vectorize, or at least localize memory access, where possible, and that is perhaps simpler than even a rough explanation of TLB.

With your VTune license, you are entitled to use the PTU profiler, discussed on the WhatIf forum. It's a faster moving target, with less documentation, but perhaps easier to learn by the "hard knocks" route.
0 Kudos
croucar1
Beginner
721 Views

Thanks! I really appreciate not having to touch assembly code.

I take it that the diagnostic process isn't really language-specific? VTune and event counters don't care too much about what language the source code was in, so I should post on VTune instead of Fortran (or in addition to - what's the social norm?)

The code is full of IFs, GOTOs, EQUIVALENCEs, and many things that became evil when caches were invented. So although the physicists think it is a numerical application, it probably isn't (4% FP ops). PGO will probably help, because there are a lot of error condition tests that are almost never true and core logic is ray tracing with boundary interactions. But I don't need VTune for that.

In principle, the code isembarrassing parallel - computing propagation over independent radials. But with all the COMMONs and EQUIVALENCEs, I'm afraid that the compiler will be too scared to try ;-)

0 Kudos
Steven_L_Intel1
Employee
721 Views
Correct - the performance tuning process is language-neutral, though it certainly helps to understand what the language implies or doesn't imply for certain syntax usages. (For example, Fortran typically assumes variables of different names aren't aliased, C assumes they are.)

The proper etiquette is to not double-post. I'd suggest you take up the discussion in the VTune forum and come back here only if you need Fortran-specific advice.
0 Kudos
croucar1
Beginner
721 Views

Looking at the VTune forum(s) raised another question -

Is performance tuning O/S-neutral?

Both Fortran and VTune have two forums, one for Windows and one for Linux. That makes sense for product support issues, but fragments the community for language questions and hardware questions. If my questions are O/S independent, then I'd want to post in the O/S forum with the bigger audience. (Is that Windows or Linux?)

0 Kudos
Steven_L_Intel1
Employee
721 Views
Good question. I would say, for the most part, yes, it is OS-neutral. Not to say that there aren't differences among OSes for performance, but they tend to not be in areas you can control. I'm sure there's an odd case or two out there where tuning for the OS matters, but my advice is to not be concerned about it.
0 Kudos
Reply