Newbie here. People told me Intel compiler is way faster than gcc. And I found it is true on our cluster with intel chips. But these days we bought a new AMD chips cluster. I am not sure what compiler options should I use. Could you please give me some instructions? Thanks. Here is the AMD details,
processor : 63
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD Opteron(tm) Processor 6378
stepping : 0
cpu MHz : 2399.837
cache size : 2048 KB
physical id : 2
siblings : 16
core id : 7
cpu cores : 8
apicid : 79
initial apicid : 79
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr tbm topoext perfctr_core cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips : 4799.73
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate  
Welcome!!! Intel compiler options "/Qx <SIMD_ext>" (Windows) "-x <SIMD_ext>"(Linux) and "/Qax" (-ax --> Linux) will do an Intel Processor check and do an code generation according to the architecture (So probably these options may not work on non-intel architecture). I could see "AVX" in flags and i am not sure if all the AVX instructions are supported by AMD* processors. I see an blog in google http://www.theregister.co.uk/2009/05/06/amd_does_avx/ which states AMD* supports an bunch of AVX instructions (But we are not sure which are those bunch of AVX instructions are).
However note that "Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors."
*These brands are property of others.
Sukruth H V
Can I just turn on use the -xHost and let it auto-select the best? Or am I misinformed? Now I am using
icpc -std=c++0x -Dlinux64 -DWM_DP -wd327,654,819,1125,1476,1505,1572 -xHost -O2 -no-prec-div -DNoRepository -DMPICH_SKIP_MPICXX
Sure, You are right. Option "-xhost" tells the compiler to generate instructions for the highest instruction set available on the compilation host processor. This should work on non-intel processors too. Please refer to the compiler documentation which is available in the compiler installation path. This is a good source for all the compiler options.
Sukruth H V
Thank you Sukruth.
I have already checked that documentation. But my confusion is since I found "sse sse2 avx ssse3 sse4_1 sse4_2" all of these flags in my AMD chip info, then according to this page "http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/index.htm", what will -xHost become? Does it mean avx is higher than the other options, or what? Thanks
Does it mean I should always use -xHost for conveneience (even for my intel cluster)? On our Intel cluster, I saw someone is using -xSSE3, my question is, is -xSSE3 a better setting than the -xHost and will give an better optimization in case SSE3 is available?
I'd like to know the priority of these techinical terms lest I will use -xHost blindly. Thank you :)
Take a look at : http://www.agner.org/optimize/blog/read.php?i=49
There are ideas on how to take the most with Intel compiler generated code while running on AMD. I am using this ideas with good results so far. Without the workaround the results are far lower in my experience.
-xHost implies a check for Intel CPU. Options such as -msse3, -msse4.1, -mavx will not be subject to the check, so the user takes responsibility for running on a CPU which will not throw illegal instruction. Such options (with a compiler recent enough to accept them) will be preferable to make a build which will run on both Intel and non-Intel CPUs. The imf-consistency options, to avoid numerical problems with differing architectures, have improved in the latest compilers. They will likely restrict the math libraries from using sse4 or avx, independent of use of the -m options.
No, I have two clusters.
For the 1st, I am compiling on an amd frontend and run on amd computing node, they are identical.
For the 2nd, I am compiling on an intel frontend and run on intel computing node, but they are not identical.
What should I do then for these two builds? And what's the difference between -m and -x?
-x implies a run-time check for Intel CPU, which you probably don't want for the amd, unless you prefer to reinvent Agner's old suggestions to bypass the check. -xHost means use as many as possible of the instructions available on the (Intel) build node, so that the build won't run on an older architecture. I don't think -xHost was meant to drop to -msse2 (with no run-time check) if the build node is not Intel. It is possible to specify the arcnitecture of the CPU you will run on, even if the build node is an older one, if there are no run tests in the build sequence, so you could build -xavx on anything supporting sse2.
Applications which run at least as well built with sse4.1 when the machine also supports sse4.2 are fairly common. Less common but possible is a case which runs faster when built with an older option than with AVX.
-m doesn't check CPU at run time; you could run on AMD without switiching in schemes to masquerade as an Intel CPU. Default when no -x or -m option is given is -msse2 which will work on AMD back to original Opteron or Athlon64 as well as any Intel from P4 on.
Hi TimP. I use one of the ideas from Ager Fog: overwrite __intel_cpu_indicator_init() with my own function and setting a fair value to __intel_cpu_indicator , independent of the cpu brand. The code generation is directed as : /QxSSE2 or SSE3 (specific code without alternative) and the results are very good. The problem with helping DW is that I work for Windows systems and have no experience with Linux OS or compilers, but I am sure that a port of the solution to LINUX should be as good as it is for Windows.
I have not tried AVX code because do not have any AMD with such capability.
The point is that the -m linux options and /arch: Windows options were adopted in an attempt to eliminate the incentive to over-ride the cpu_indicator functions. Note that the -x or -Qx code generated by the compiler doesn't use the same cpu_indicator functions as the math and performance libraries, which don't observe the architecture choice made with /arch: or /Qx...
/Qimf-arch-consistency:true (linux spelling -imf-arch-consistency true) would eliminate the cpu_indicator usage for math library calls and presumably give you an accurate SSE2 implementation, which has become reasonably efficient in recent updates. I don't know whether SSE3 is used for complex vector math in consistency math functions; if you find significant problems with that, you're invited to file a reproducer. Or even if you can't figure it out from the documentation, file a report on that.
If an AMD platform running in 64-bit mode doesn't get the standard SSE2 or SSE3 math library functions, I believe that is a reportable bug (and not an intentional one).