- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
I am now trying to build a Fortran program with Fortran compiler (version 2013.3.163). I choose -O2 (or -O3), -xHost option in the building process. The built program works fine under ubuntu 12.04 (kernel 3.5.0-26) in the vmware virtual machine in my laptop (the CPU is i7-3720QM), the speed is 3.19 times faster than the version of unoptimized (-O0, -g option used). However, when I run the program under ubuntu 12.04 (kernel 3.5.0-26) in a HP Z800 workstation (with two Xeon E5620 CPU), the program speed is the same as that of unoptimized, i.e., the optimization does not work with E5620 CPU.
I have tried the optimized program built with gfortran, it works fine under E5620, although not so fast as the program built with ifort (which can work in my laptop). I am really confused why the optimization does not work in E5620. I am a new bee with ifort and do not know how to debug and diagnose the problem. Can anyone give me some suggestions? Thanks a lot! My email address is: wyffrank@gmail.com.
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>> I choose -O2 (or -O3), -xHost option ...my laptop (the CPU is i7-3720QM)...). I am really confused why the optimization does not work in E5620.
Are you saying you compiled the program on your laptop (CPU is i7-3720QM) with -xHost.
Then copied the execuitable program to E5620 CPU?
If yes, then the Host CPU during compilation is not the Host CPU during run time.
To fix this problem for portable program remove the -xHost.
Or recompile on E5620 with -xHost
Jim Dempsey
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Is your compute intensive parallel code performing a high degree of atomic, mutex/lock, critical or library calls containing the same (e.g. RAN DRAN)?
Your notebook (one CPU) has a single Last Level Cache (LLC/L3) and single memory system.
Your Z800 workstation (two CPUs) has two Last Level Cache (LLC/L3) and two memory systems (one per CPU).
The atomic, mutex/lock, critical or library calls generally take much longer when system has multiple LLC and/or memory systems.
Do you have a profiler? If so, then this might identify the section of code that is causing the bottleneck.
Jim Dempsey