Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28996 Discussions

fortran from ppc f77 to core duo ifort

Deleted_U_Intel
Employee
270 Views
I'm transitioning from f77 coding on a single PPC to ifort on duo processors. I can compile and run some code that is "embarassingly parallel" - a loop calling the same subroutine with different input. I am trying to understand what is being done by the compiler to share the calculation between the two processors. The call "ifort myprog.f" creates code that when run occupies 50% of both processors for 13 seconds. "ifort -parallel" creates an executable that runs for 13 seconds as well but fully occupies both processors. During the compile the former gives the message that LOOP WAS VECTORIZED. The later reports that LOOP WAS VECTORIZED and AUTO-PARALLELIZED. Adding the optimization flag -O3 did not improve the execution time.

So, I would like to know what the compiler strategy is in creating FORTRAN executables. What is the difference between vectorizing and parallelizing? Will I generally see sharing of the two processors for code compiled this way? Why does the parallel code more fully occupy the two processors but not finish any faster?
0 Kudos
1 Reply
Steven_L_Intel1
Employee
270 Views

Vectorizing and parallelizing are different. Vectorizing is making use of the SSE instructions to perform up to four operations in a single instruction. This is independent of parallelizing.

Parallelizing is finding opportunities where work can be done in more than one thread, typically array operations in loops.

Unless you can see how the threads were actually used, you don't know if, perhaps, one thread did almost all of the work. It's also possible that your program is restricted by memory access time, which parallelzation would not help (might even hurt). The results you get suggest that might be the case.

The Intel Thread Profiler is a good tool to visualize how your program is using threads. You should also use the -opt-report switch and its keywords to get more advanced optimization reports from the parallelizier to see if there are hints as to what could be improved.

0 Kudos
Reply