BTW, the program needs large

Yongfu_W_ · ‎04-05-2013

I am now trying to build a Fortran program with Fortran compiler (version 2013.3.163). I choose -O2 (or -O3), -xHost option in the building process. The built program works fine under ubuntu 12.04 (kernel 3.5.0-26) in the vmware virtual machine in my laptop (the CPU is i7-3720QM), the speed is 3.19 times faster than the version of unoptimized (-O0, -g option used). However, when I run the program under ubuntu 12.04 (kernel 3.5.0-26) in a HP Z800 workstation (with two Xeon E5620 CPU), the program speed is the same as that of unoptimized, i.e., the optimization does not work with E5620 CPU.

I have tried the optimized program built with gfortran, it works fine under E5620, although not so fast as the program built with ifort (which can work in my laptop). I am really confused why the optimization does not work in E5620. I am a new bee with ifort and do not know how to debug and diagnose the problem. Can anyone give me some suggestions? Thanks a lot! My email address is: wyffrank@gmail.com.

Yongfu_W_ · ‎04-05-2013

BTW, the program needs large local arrays, so I use ulimit -s unlimited to enlarge the stack size. I don't believe this would affect the optimization, since it works fine in my laptop.

Yongfu_W_ · ‎04-05-2013

BTW, the program needs large local arrays, so I use ulimit -s unlimited to enlarge the stack size. I don't believe this would affect the optimization, since it works fine in my laptop.

jimdempseyatthecove · ‎04-05-2013

>> I choose -O2 (or -O3), -xHost option ...my laptop (the CPU is i7-3720QM)...). I am really confused why the optimization does not work in E5620.

Are you saying you compiled the program on your laptop (CPU is i7-3720QM) with -xHost.
Then copied the execuitable program to E5620 CPU?

If yes, then the Host CPU during compilation is not the Host CPU during run time.
To fix this problem for portable program remove the -xHost.
Or recompile on E5620 with -xHost

Jim Dempsey

Yongfu_W_ · ‎04-05-2013

I compiled the program on E5620 with -xHost too, but it does not work.

jimdempseyatthecove · ‎04-05-2013

Is your compute intensive parallel code performing a high degree of atomic, mutex/lock, critical or library calls containing the same (e.g. RAN DRAN)?

Your notebook (one CPU) has a single Last Level Cache (LLC/L3) and single memory system.
Your Z800 workstation (two CPUs) has two Last Level Cache (LLC/L3) and two memory systems (one per CPU).

The atomic, mutex/lock, critical or library calls generally take much longer when system has multiple LLC and/or memory systems.

Do you have a profiler? If so, then this might identify the section of code that is causing the bottleneck.

Jim Dempsey

Optimization problem under Linux and Xeon E5620 CPU