Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

multiple processors

karose
Beginner
1,524 Views

I just got my first mac - a 8 processor mac pro. I was under the impression that Intel fortran would make automatic use of the 8 processors when running fortran codes.

That is without MPI or other coding.

I tried several compiler options (-parallel -O3), and the code uses all 8 processors at start-up when there is alot of read/writing from disk but then quickly goes on for computations and uses one processor only.

I used the same code and complier options on a Dell and both processors are clearly be used throughout the execution.

Did I forget to install something, like the Math Kernel Library? I do not think so, but then again, it is not working for me.

Thanks

Kenny

0 Kudos
10 Replies
jimdempseyatthecove
Honored Contributor III
1,524 Views

Kenny,

Read the documentation relating to OpenMP. Try a few of the examples.

Jim

0 Kudos
karose
Beginner
1,524 Views
Jim

I will look at OpenMP, but I thought that required modifying my code. The description of intel fortran talks about auto-parallelization and multi-threading and making use of multi processors automatically. The Windows version does this, and the description of the Mac version by intel sounds exactly the same to me.

Kenny
0 Kudos
Steven_L_Intel1
Employee
1,524 Views
I'd suggest enabling the optimization reports and look to see if it says loops were parallelized. Could it be that your program can take advantage of two processors but more doesn't help?
0 Kudos
karose
Beginner
1,524 Views
Steve

Thanks for the response. I will look at the report. I know the code is being parallelized and vectorized because these statements go by in the X11 window.

Also, just to be clear. Two processors were used with a Dell PC that had two processors. The same code onthe Mac is only using one processor of the eight processors.

I would be happy if two of the 8 processors were being used.

If the Mac is only using one processor when it couls use up to 8, that is dissapointing. The identical code runs much faster on the 2-processor Dell PC than on the 8-processor Mac!!!

Kenny
0 Kudos
TimP
Honored Contributor III
1,524 Views
A couple of things to check:
Do you have -parallel set for link?
Do you have environment variables set to enable threading? Setting OMP_NUM_THREADS=8 might help.
0 Kudos
Ron_Green
Moderator
1,524 Views
To get a more fair comparison, I would set OMP_NUM_THREADS=2 on the Mac and compare runtimes with the PC/Windows host with the same setting. I assume you use the Intel compiler on both, with the compiler at the same (approx) build dates ( ifort -V to check). And you use wall clock for comparison of relative application performance, or how to you measure that 'the Dell is faster'? And do you adjust for clock frequencies (scale as appropriate)? Same compiler options (check the buildlog.htm on IVF and make sure the options are similar on Mac).

Another thing about Windoze PCs, don't use the performance meter to determine if both CPUs are being used. Windows is notoriously sloppy with thread affinity - that is, threads migrate amongst the CPUs - so what looks like 2 cpus being used is often one thread bouncing amongst the 2 CPUs. You don't see this on Mac and modern Linux distros - threads "stick" to a single cpu.

Again, the goal is to get apples to apples comparison. Application wallclock is a decent rough measure without getting into more advanced performance analysis - your watch should suffice (I'm assuming that your app takes minutes or more to run. If you're measuring something that runs in around 10 seconds or less then we need to talk.). Calls like 'dtime' and 'etime' can be problematic. Again, how are you measuring? Make sure you have the same compiler from approximately the same build date (Mac and Windows versions don't match up, look for the build dates), use the same options (or as close as possible), etc.

ron
0 Kudos
karose
Beginner
1,524 Views
Yes - I did the -parallel flag onthe compiler.
I did not do the setting of the environmental variables to enable threading.
I will try this tomorrow.

I will look up how to actually do the setting - i.e., what is the actual command and where do I enter it.

More tomorrow after I try this.

Note that Absoft fortran says their complier does make use of multiple processors on a Mac automatically.

Kenny
0 Kudos
karose
Beginner
1,524 Views
ron

Without appearing lazy or not well informed, do you mind telling me how exactly to set the following?

OMP_NUM_THREADS=2

Is this done in the X11 window, or within the files of Intel fortran?

I am not using the xcode environment to edit or compile the code. Simply a text editor for now (EMAC eventually).

The comparison I am doing, while not scientific, is practical. I have identical fortran codes on both the Dell PC and Mac and compile them with Intel fortran. Then I run them and use a wall clock to determine how long they take to complete. The code takes about 1 hour on the 2-processor Dell and about 1.6 hours on the 8-processor Mac. So it is not just poor timing by me. There is no adjustment for clock speed. The Mac has much more memory and I believe a very simialr clock speed, which I will confirm.

So I am getting the sense that my expectations that the Intel fortran on the Mac should be using multiple processors, is this correct?

Kenny
0 Kudos
TimP
Honored Contributor III
1,524 Views
We are suggesting that you try to set up your environment to specify 2 (or more )threads:
bash (or ksh)> export OMP_NUM_THREADS=2
tcsh (or csh)> setenv OMP_NUM_THREADS 2
then run the program.
In principle, the program compiled and linked by ifort -parallel should figure out how many cores are present, and attempt to use them all, provided that no environment variable settings alter this behavior.
0 Kudos
karose
Beginner
1,524 Views

First, I want to thank people who responded. It has been very helpful. Please bear with me a bit longer - I last used unix 15 years ago (a Dec Alpha machine!), and this is my first mac.

Some progress. I am convinced the intel fortran is trying to use multiple processors. I did what people suggested and also changed the looping structure on my code. By telling it to use 1, 2 or 8 processors with OMP_NUM_THREADS, I can see differences in how many processors are used.

The bad news it is still slower than the 2-processor Dell PC. But it seems that my code has a structure that restricts the ability to use more than processor very much and the Dell has a faster clock speed. So one processor versus processor lets the faster one win.

One last issue I have is the following. I can compile with -parallel OK. But if try to add the flag -fast I get the following errors.

bash-3.2$ ifort smelt_mar14.f90 -fast -o smelt_mar14

ipo: warning #11043: unresolved _fegetenv

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: warning #11043: unresolved _fegetround

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: warning #11043: unresolved _fesetenv

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: warning #11043: unresolved _nan

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: warning #11043: unresolved _nanf

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: warning #11043: unresolved _nextafterf

Referenced in /usr/lib//libSystem.dylib

Referenced in /usr/lib//libdl.dylib

Referenced in /usr/lib/libSystem.B.dylib

ipo: remark #11001: performing single-file optimizations

ipo: remark #11005: generating object file /var/folders/ur/urqDSuESGNGyxHqcBIm4kk+++TI/-Tmp-/ipo_ifort6xgP66.o

smelt_mar14.f90(484): (col. 50) remark: FUSED LOOP WAS VECTORIZED.

smelt_mar14.f90(514): (col. 7) remark: LOOP WAS VECTORIZED.

smelt_mar14.f90(765): (col. 1) remark: FUSED LOOP WAS VECTORIZED.

smelt_mar14.f90(797): (col. 4) remark: PARTIAL LOOP WAS VECTORIZED.

smelt_mar14.f90(824): (col. 4) remark: PARTIAL LOOP WAS VECTORIZED.

smelt_mar14.f90(826): (col. 7) remark: PARTIAL LOOP WAS VECTORIZED.

I presume I need to do more than say source/opt/intel.../iccvars.sh - I solved the dyld library not loaded problem.

Any suggstions?

Thanks much.

Kenny

0 Kudos
Reply