Interesting that firmware

Johannes_Rieke · ‎02-04-2015

Dear all, it's maybe off topic, but I know no other place to address this issue to. And it is Fortran related. I got a new workstation (E5-1650v3, 64GB ram, SSD, Win7 64 bits) from a big global player vendor two days ago and was happy to have a new toy. Unfortunately not long... After installing all drivers up to date, I installed VS and the Intel Compilers in version 15.0.1. Happy with fast installing on a SSD, I tested compiling my Fortran tools and was confused. All my code runs even slower than on my old workhorse (E5620). Not believing that I tested other programs and on another new E5-1650v3 workstation of a colleague. The same results. Then I tested the same executable with the same inputs on an IvyBride-EP (E5-1650v2) and got expected results (attached file: Performance_test_5810_custom_Fortran_code.pdf). My compiler options are: main code: /nologo /O3 /Qparallel /Qopt-matmul /fpp /I"D:\aux_lib\x64\Release" /I"D:\geo_lib\x64\Release" /arch:SSE3 /Qopenmp /Qvec-report1 /warn:all /real_size:64 /Qinit:zero /fp:source /Qfp-speculation=safe /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc100.pdb" /check:bounds /libs:static /threads /c aux_lib: /nologo /O3 /Qparallel /Qopt-matmul /fpp /arch:SSE3 /Qopenmp /Qvec-report1 /warn:declarations /warn:unused /warn:uncalled /warn:nousage /warn:interfaces /real_size:64 /Qinit:zero /fp:source /Qfp-speculation=safe /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc100.pdb" /libs:static /threads /c geo_lib: /nologo /O3 /Qparallel /Qopt-matmul /fpp /I"D:\aux_lib\x64\Release" /arch:SSE3 /Qopenmp /Qpar-report1 /Qvec-report1 /warn:all /real_size:64 /Qinit:zero /fp:source /Qfp-speculation=safe /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc100.pdb" /libs:static /threads /c To be sure that it is not something in my code, I took the Intel optimized Linpack benchmark out of the mkl folder and made some test runs on three different machines. The frustrating result is again, that the E5-1650v3 performs not nearly as fast as E5-1650v2. The E5-1650v3 is in my test 30 to 40 per cent slower. I wonder now where this comes from and whether I have to choose other compiler options to get the desired performance on a Haswell-EP? If anybody has a guess, what to do, please, write a comment. Best regards, Johannes

Steven_L_Intel1 · ‎02-04-2015

Well, it's not "Fortran related" in that the same EXE runs differently. I do wonder why you are saying /arch:SSE3 - you're artificially constraining the compiler. Please use /QxHost or /QxCORE-AVX2 instead and see what you get. I also wonder if you're getting proper use of the multiple cores - have you run the program under VTune to analyze the threading performance?

TimP · ‎02-04-2015

HSW platforms I had access to had no option to disable hyperthreading as usually done for such benchmarks.

opt-matmul would need mkl new enough to recognize v3 server. When I last tested one, there had not been a public release of hsw server software tools even though hardware launch had occurred so we were blocked from completing some tests. Mkl consistency mode might enable a comparison if you don't look for full performance on either platform.

Johannes_Rieke · ‎02-05-2015

Dear Steve, dear Tim!

thanks for the comments.

Why /arch:SSE3: Many of our workstations are still Westmere-EPs (E5620) and do not support AVX as far as I know. Further in this case I wanted to avoid AVX because the turbo mode for the Haswell-EP is limited below the non-AVX, if I'm right. Later I wanted to see, if there is a speed up through AVX although the frequency is limited to a lower upper bound. A direct comparison with the same executable running on all Xeon generations has been preferred by me for the initial tests. Nevertheless, does not /fp:source and /Qfp-speculation=safe prevent the compiler to use SSE/AVX (at least for transcendentals)?
My self-written Fortran application was checked via VTune before and I erased some issues, where threading has been disadvantageous. However, I made one test compiled without OpenMP and the result is nearly identical to the case where I compiled with /Qopenmp and the limit is set to one thread only.
Hyperthreading has been active on all Xeons during the test. I never have seen a slow down before through hyperthreading.
Is MKL 11.2 (e.g. shipped with 15.0 update one) supporting Haswell-EP?
I've a gut feeling that the workstation manufacturer (one of the big enterprise suppliers) has to improve something (UEFI, firmware whatever). I have opened a support request on the manufacturers enterprise support and I'm very curious what they will answer. At least Linpack should not show this low performance on Haswell-EP compared to Ivybride-EP?
Does anybody owns an E5-1650v3 an can run the Intel Xeon optimized Linpack benchmark (MKL 11.2)? A result from an other brand workstation would be interesting.

Best regards, Johannes

Johannes_Rieke · ‎02-05-2015

Dear all,
the issue is solved. Steve you are right. It has nothing to do with Fortran or the Intel compilers. The manufacturer released just 4 hours ago a new UEFI firmware. After I installed this, I get a completely other picture. All results are in the expected direction and I'm happy again with my new toy.
For the interested readers I attached the latest benchmark results.
However I will play around with AVX and I'm curious about the impact on the performance on my code.
Best regards, Johannes

Steven_L_Intel1 · ‎02-05-2015

Glad to hear it.

TimP · ‎02-05-2015

Interesting that firmware update helps.

/fp:source prevents vectorization where numerical results might vary slightly, but it should not prevent opt-matmul from choosing the most aggressive mkl function for your CPU. Likewise, mkl is not limited by your arch choice.

Massive performance issue with code on Haswell-EP (E5-1650v3)