- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I hope that this is a simple question:
Which compiler flags should i use to get the best performance out of an AMD Epyc processor (particularly for MPI and OpenMP codes)? I know which instruction sets it is theoretically capable of. But since there were "problems" in the past where the intel compiler would choose slower execution paths for non-intel CPUs and there is this disclaimer, I feel that I should ask an expert first instead of blindly trusting the software to do its best.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since I no longer work for Intel, I think I can say that I disagree with the premise of the statement. The only part of this I consider remotely true is that if you use the auto-CPU dispatch option -aX, then non-Intel processors take the "generic" path, whatever you have set that to. The -x options (-xHost excepted), as the disclaimer notes, reserve some optimizations for Intel processors and add a check at program start that gives an error if the CPU type doesn't match. The -m or -arch options omit this check. You are unlikely to find any compiler that consistently outperforms Intel's on an AMD CPU (for many years, AMD would use Intel compilers for their SPEC submissions.)
I would recommend the use of -xHost. This will select the best option for the processor you're compiling on, Intel or non-Intel. (I wrote the initial code that does this determination.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did not want to start a discussion about the premise...
If I understand correctly, the fact that the auto-CPU dispatch options will choose a slower path for non-intel CPUs makes this option unusable for non-intel CPUs. I will need a separate binary for every type of CPU?
Concerning xHost: what if our development workstation all have different generations of Intel CPUs, but the code may or may not be run on AMD epyc CPUs. In this case I need something different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may be of help:
IVF documentation Index | OPTIMIZATION_PARAMETER | ATTRIBUTES OPTIMIZATION_PARAMETER
You can use:
!DIR$ ATTRIBUTES OPTIMIZATION_PARAMETER: string::{ procedure-name | named-main-program}
string |
Is a character constant that is passed to the optimizer. The constant must be delimited by apostrophes or quotation marks, and it may have one of the following values:
|
...
The characters in string can appear in any combination of uppercase and lowercase. The following rules also apply to string:
-
If string does not contain an equal sign (=), then the entire value of string is converted to lowercase before being passed to the optimizer.
-
If string contains an equal sign, then all characters to the left of the equal sign are converted to lowercase before all of string is passed to the optimizer.
Characters to the right of the equal sign are not converted to lowercase since their value may be case sensitive to the optimizer, for example “target_arch=AVX”.
You can specify multiple ATTRIBUTES OPTIMIZATION_PARAMETER directives for one procedure or one main program.
For the named procedure or main program, the values specified for ATTRIBUTES OPTIMIZATION_PARAMETER override any settings specified for the following compiler options:
-
x
, -m, and /arch
...
This isn't as elegant as using an auto-dispatcher as you may have to explicitly perform your dispatch.
Note 2:
The Fortran code is likely calling (once) the C/C++ code to determine the CPU architecture and then load a bitmask of supported features. You could step into this (probably best using a C/C++ explorative program), to locate the address (and hopefully a global symbol name). Once located, you can jamb in whatever bits you want. I do not know what EPYC supports, perhaps it supports AVX-2 and/or FMA.
Note 3:
You can search the Intel C++ documentation for cpu_dispatch and _allow_cpu_features. This may help you craft your own dispatcher.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Possible guid:
1) create a Fortran project that is .NOT. to be built
2) in this project, insert the bodies of your multi-build subroutines and functions .WITHOUT. SUBROUTINE or FUNCTION declaration.
Example on my Windows system (You can adapt for Linux) I have a Fortran Project folder named NoBuild containing foo_body.f90:
! subroutine foo(v, a, t, n) implicit none integer, intent(in) :: n real, intent(inout) :: v(n) real, intent(in) :: a,t v = v + a * t ! end subroutine foo
3) Construct your program and use of multi-generated function like this:
module mod_dispatch integer :: yourDispatchCode = 0 contains subroutine foo(v, a, t, n) implicit none integer, intent(in) :: n real, intent(inout) :: v(n) real, intent(in) :: a,t select case(yourDispatchCode) case (0) call foo_SSE2(v, a, t, n) case(1) call foo_AVX(v, a, t, n) case default call foo_SSE2(v, a, t, n) end select end subroutine foo subroutine foo_SSE2(v, a, t, n) !dir$ attributes optimization_parameter: "target_arch=SSE2" :: foo_SSE2 include "..\NoBuild\foo_body.f90" end subroutine foo_SSE2 subroutine foo_AVX(v, a, t, n) !dir$ attributes optimization_parameter: "target_arch=AVX" :: foo_AVX include "..\NoBuild\foo_body.f90" end subroutine foo_AVX end module mod_dispatch program Dispatch use mod_dispatch implicit none integer, parameter :: n = 100 real :: v(n) real :: a, t v = 0.0 a = 9.1 t = 0.1 yourDispatchCode = 1 ! you determine CPU supported features and set value here call foo(v, a, t, n) print *,v end program Dispatch
I hope this helps.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexander S. wrote:
I did not want to start a discussion about the premise...
If I understand correctly, the fact that the auto-CPU dispatch options will choose a slower path for non-intel CPUs makes this option unusable for non-intel CPUs. I will need a separate binary for every type of CPU?
Concerning xHost: what if our development workstation all have different generations of Intel CPUs, but the code may or may not be run on AMD epyc CPUs. In this case I need something different.
If you wish to make a multi-architecture binary, you should set the default architecture to the oldest architecture you intend to support, for example -msse3. If you have complex arithmetic, this could be much faster than the default. For example, -axAVX -msse3 should generate both AVX and SSE3 execution paths (for those cases where the compiler sees an advantage for AVX).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I wrote earlier, auto-CPU dispatch will select the "generic" code path on non-Intel CPUs. This is not necessarily the slowest - it depends on whether you specified a non-default generic instruction set and what the application does. As of the 18.0 version, the Intel compiler supports -m as high as -mavx. I don't know which of the Intel instruction sets the EPYC processors support.
Tim's advice is what I'd recommend, since now you're saying that the application may run on a variety of different processor generations. See https://software.intel.com/en-us/fortran-compiler-18.0-developer-guide-and-reference-m for the various choices.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for all your valuable input. I am beginning to get a better understanding how it should be done correctly - and why there are some salty people on the internet who strongly disagree with the way intel compiler handles non-intel architectures by default.
I might get back to this topic once our AMD test workstation is deployed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, either by design or bug, -xHost does not function on AMD EPYC's. Very disappointing :(
cpu family : 23
model : 1
model name : AMD EPYC 7401P 24-Core Processor
stepping : 2
microcode : 0x8001207
cpu MHz : 1996.236
cache size : 512 KB
physical id : 0
siblings : 48
core id : 0
cpu cores : 24
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx cpb hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload overflow_recov succor smca
bugs : fxsave_leak sysret_ss_attrs null_seg
bogomips : 3992.47
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
user@a-compute-01:~$ cat test.c
/* Hello World program */
#include<stdio.h>
main()
{
printf("Hello World");
}
user@a-compute-01:~$ icc test.c -o test
user@a-compute-01:~$ ./test
Hello World
user@a-compute-01:~$ icc -xHost test.c -o test
user@a-compute-01:~$ ./test
Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT and AVX instructions.
user@a-compute-01:~$
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page