<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1234305#M153052</link>
    <description>&lt;P&gt;&amp;gt;&amp;gt;&lt;EM&gt; I was expecting you guys to fix it along with IVF compiler ...&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;I am not an Intel employee, (Neither is Steve). You have too high of expectations on this forum.&lt;/P&gt;
&lt;P&gt;Your problem is MKL is designed for use in 3 ways&lt;/P&gt;
&lt;P&gt;Single threaded app + single threaded MKL&lt;BR /&gt;Single threaded app + mult-threaded MKL&lt;BR /&gt;Multi-threaded app + single threaded MKL&lt;/P&gt;
&lt;P&gt;What you want is&lt;/P&gt;
&lt;P&gt;Multi-threaded app + muilti-threaded MKL&lt;/P&gt;
&lt;P&gt;In order to do this, you must make additional effort to assure that the several thread pools do not conflict, as well as are optimally placed. The "standard" environment variable settings (as well as pre-1st parallel region omp_... runtime calls), are not setup do do what you want them to do. To get what you want will take some finesse.&lt;/P&gt;
&lt;P&gt;The allocatable verses static allocated is not the issue. At issue is where the threads are placed so they do not conflict.&lt;/P&gt;
&lt;P&gt;Taking your system (assuming HT disabled, though it would be preferred to have it enabled) with 64 HW threads, one possible configuration would be&lt;/P&gt;
&lt;P&gt;App with 4 OpenMP threads + 4 instances of MKL (one per app thread) each with 16 threads. OR&lt;BR /&gt;App with 8 OpenMP threads + 8 instances of MKL (one per app thread) each with 8 threads.&lt;BR /&gt;...&lt;/P&gt;
&lt;P&gt;IOW the total active threads to not exceed the total hardware threads.&lt;/P&gt;
&lt;P&gt;Two problems with "standard" approaches.&lt;/P&gt;
&lt;P&gt;No affinity pinning, threads may get scheduled on same hardware thread (not good).&lt;/P&gt;
&lt;P&gt;Affinity pin (KMP_AFFINITY=scatter), may end up placing the app OpenMP threads scattered as desired, *** but may restrict those threads, each, to a single hardware thread (as opposed to 1/4th or 1/8th of the HW threads), thus when each MKL initializes its OpenMP thread pool (to 16 or 8 threads), the threads of each MKL OpenMP thread pool will be constricted to the permitted (pinned) HW threads of the parent thread, and in this case to 1 thread. This is much worse than "(not good)". e.g. running 64 software threads on 4 hardware threads. (or 64 SW threads on 8 HW threads).&lt;/P&gt;
&lt;P&gt;What I've said above, is not to say that there is an official way of doing this, but for me, not knowing this, I will have to resort to some programming gymnastics to get what I (you) want.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Fri, 04 Dec 2020 20:03:00 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2020-12-04T20:03:00Z</dc:date>
    <item>
      <title>What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win10?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227528#M152609</link>
      <description>&lt;P&gt;What are the optimal Fortran Parallel Studio Pro XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win10?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2020 19:09:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227528#M152609</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-11T19:09:51Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227854#M152610</link>
      <description>&lt;P&gt;Start with /QxHost /O3 /Qipo&amp;nbsp; This is general advice for any processor. Since you have a high-core-count processor you'll also want to look at parallelization. You could start with /Qparallel but would probably do better to work in OpenMP. Of course a lot depends on what your application is doing.&lt;/P&gt;
&lt;P&gt;I should add that there is no single "optimal" set of options. You'll have to try different things and see what works best for your application.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2020 17:12:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227854#M152610</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2020-11-12T17:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227898#M152611</link>
      <description>&lt;P&gt;I usually start with what works for most of my programs, which is a superset of your starting point, (see below), including /QxHost. For this program and many others, the /QxHost option results in many (not all) programs crashing at runtime on AMD Threadrippers.&lt;/P&gt;
&lt;P&gt;Also the /Qipo option resulted in Error 10014 problem during multi-file optimization compilation (code 3).&lt;/P&gt;
&lt;P&gt;The program is proprietary and consists of over 100 subroutines/functions/modules.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;/nologo /debug:full /MP /O3 /QxHost /Qparallel /heap-arrays1000 /Qopenmp /convert:native /fpscomp:general /stand:f18 /Qdiag-disable:5268,10182,5462 /warn:all /Qtrapuv /fpe:0 /fpconstant /module:"x64\Release/" /object:"x64\Release/" /Fd"x64\Release\vc160.pdb" /traceback /check:none /libs:qwin /Qmkl:parallel /c&lt;/LI-CODE&gt;
&lt;P&gt;below&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2020 20:32:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227898#M152611</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-12T20:32:11Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227956#M152616</link>
      <description>&lt;P&gt;Don't use /Qtrapuv - it's worthless. Don't bother setting a value on /heap-arrays, just use the default. /convert:native is the default - why are you specifying it? Using both /Qparallel and /Qopenmp seems problematic to me. And I see no point in /debug:full with /O3, but I suppose it's somewhat harmless.&lt;/P&gt;
&lt;P&gt;/QxHost queries the CPU and asks it which instruction sets it supports, and then chooses the best available /arch option. If the CPU is misrepresenting which instructions it supports, that can be a problem. What kind of "crash" is it? I don't know what the Threadripper CPU supports, and AMD's web site isn't helping me find it. You might try experimenting with /arch instead of /QxHost - start with /arch:SSE3, see if it likes that, then try SSE4.1, SSE4.2 or AVX.&lt;/P&gt;
&lt;P&gt;I don't know what IPO is complaining about, but try without that for now.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 00:24:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1227956#M152616</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2020-11-13T00:24:28Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228214#M152631</link>
      <description>&lt;P&gt;Hi Steve,&lt;/P&gt;&lt;P&gt;We first started communicating back in the late 1990's with DEC's DVF, which got bought out by Compaq and became CVF, which got bought out by HP, which then Intel took over the compiler as IVF. Back around 2003-2005 I helped with the Beta testing going from 32-bit to 64-bit and reported about 50 bugs. And in 2008 helped with ironing out OpenMP (with Richard)... so I'm not sure why the system lists me as "Beginner", but that's OK.&lt;/P&gt;&lt;P&gt;I generally use a prototype project as the starting point for new projects and just edit the program names. I find this approach best especially with a long list of compiler settings and libraries to link in. So it is possible that I migrate settings that have become defaults over the years, etc.&lt;/P&gt;&lt;P&gt;I just ran a dozen experiments with the options you listed and here are my results: when I turn off heap-arrays my programs gets a stack overflow error, so I reset it back. I removed /Qtrapuv and tried /arch:SSE3, 4.1, 4.2, AVX, AVX2, /QaxCOMMON-AVX512 with no speed improvement, but it didn't hang (previously when I mentioned it crashed I meant hung). I then tried /QxHost and now that did not hang, but without any speed increase. As a verification test I added back in /Qtrapuv and it hung again. So /Qtrapuv is causing the problem and without Heap Arrays it creates a new problem.&lt;/P&gt;&lt;P&gt;I tried combinations of /Qparallel and /Qopenmp and without OpenMP the program runs many times slower, probably close to 64 which would have taken an hour to run, so I killed it after 5 minutes. Qparallel doesn't seem to make a difference in runtime.&lt;/P&gt;&lt;P&gt;Also I tried the IPO option again and got the same error. The link time increased from a few seconds to a few minutes using IPO and it consumed 128 GB of RAM.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 18:59:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228214#M152631</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-13T18:59:17Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228240#M152633</link>
      <description>&lt;P&gt;You can take HP out of your timeline - HP bought Compaq after the Fortran team moved to Intel.&lt;/P&gt;&lt;P&gt;The reason the link time increased is that's actually when the optimizing happens. What matters more is the run-time. IPO works for many applications, but not all.&lt;/P&gt;&lt;P&gt;Don't use /Qax when you know you'll be running on a non-Intel processor, as you'll get only the "generic" path. What your experiments with /arch tell me is that your application doesn't take advantage of the newer instruction sets. Stick with /QxHost.&lt;/P&gt;&lt;P&gt;The forum software changed recently and it labels as "Beginner" people with few posts, and you have only four. I don't care for the level names the new forum uses.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 20:05:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228240#M152633</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2020-11-13T20:05:47Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228778#M152678</link>
      <description>&lt;P&gt;Hi Steve,&lt;/P&gt;&lt;P&gt;I turned off /Qtrapuv as suggested (which works), but do you know why it resulted in /QxHost hanging with it, but not without it?&lt;/P&gt;&lt;P&gt;Any suggestions on what is causing IPO to exit with error code 3?&lt;/P&gt;&lt;P&gt;Without setting Heap Arrays I get a stack overflow error. I'm using /heap-arrays1000 as a workaround. Is there a better solution?&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 18:55:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228778#M152678</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-16T18:55:32Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228785#M152680</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt; And I see no point in /debug:full with /O3,&lt;/P&gt;&lt;P&gt;When using VTune on optimized code (e.g. /O3) you need the debug symbol table.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 19:27:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228785#M152680</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-16T19:27:47Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228786#M152681</link>
      <description>&lt;P&gt;I would experiment with /QxHost build then /QxAVX2 build. Depending on how "friendly" the Intel compiler is to non-Intel CPUs, /QxHost could potentially fallback to /SSEnnn.&lt;/P&gt;&lt;P&gt;(please report your findings back here)&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 19:34:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228786#M152681</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-16T19:34:12Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228791#M152682</link>
      <description>&lt;P&gt;/QxAVX2 results in my program immediately aborting. /QxHost works but no speed improvement over not using it. Full debug doesn't seem to slow anything down but keeping it is one fewer changes when using Vtune.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 19:53:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228791#M152682</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-16T19:53:35Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228792#M152683</link>
      <description>&lt;P&gt;What does /QxAVX do?&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 19:54:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228792#M152683</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-16T19:54:56Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228796#M152684</link>
      <description>&lt;P&gt;It also immediately aborts my program.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 20:01:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228796#M152684</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-16T20:01:26Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228810#M152690</link>
      <description>&lt;P&gt;If you run in (under) the debugger, what is the instruction at the failure point?&lt;/P&gt;&lt;P&gt;You should get a GP halt at an address, and in the debugger you can open a Dissassembly window and goto that address.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2020 21:05:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228810#M152690</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-16T21:05:48Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228860#M152698</link>
      <description>&lt;P&gt;He cannot use ANY of the /Qx options other than /QxHost on a non-Intel CPU - it will immediately fail. Use /arch instead of /Qx.&lt;/P&gt;&lt;P&gt;I don't know why /Ftrapuv causes errors, but I do know that it doesn't do anything useful - it doesn't even do what it says in the manual. Use /Qinit:snan,arrays if you want variables initialized to a NaN.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 00:01:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228860#M152698</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2020-11-17T00:01:11Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228882#M152703</link>
      <description>&lt;P&gt;The suggestion to run Debug mode was beneficial. I started developing this code about 15 years ago and initially used Debug mode but haven't used it in 10 years. The code has been running fine on Intel CPUs. I uncovered 3 latent bugs, two format/variable-type mismatch and one shape mismatch. I normally develop using Release mode and switch runtime options off or on. I fixed all bugs. The program no longer aborts, and runs to completion in both Debug mode and Release mode.&lt;/P&gt;&lt;P&gt;Interestingly the Debug mode is faster by about 50%. Here are the compiler switches for Release and Debug modes:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; /nologo /debug:full /MP /O3 /Qparallel /heap-arrays1000 /arch:AVX /Qopenmp /fpscomp:general /stand:f18 /Qdiag-disable:5268,10182,5462 /warn:all /fpe:0 /fpconstant /module:"x64\Release/" /object:"x64\Release/" /Fd"x64\Release\vc160.pdb" /traceback /check:none /libs:qwin /Qmkl:parallel /c

 /nologo /debug:full /MP /Od /Qparallel /heap-arrays1000 /arch:AVX /Qopenmp /stand:f18 /warn:all /fpe:0 /fpconstant /module:"x64\Debug/" /object:"x64\Debug/" /Fd"x64\Debug\vc160.pdb" /traceback /check:pointer /check:bounds /check:shape /check:uninit /check:format /check:output_conversion /check:stack /libs:qwin /dbglibs /Qmkl:parallel /c&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 01:44:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1228882#M152703</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-17T01:44:05Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229062#M152707</link>
      <description>&lt;P&gt;You should be using either&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /Qparallel (auto parallelization)&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;&amp;nbsp; /Qopenmp (OpenMP directive parallelization)&lt;/P&gt;&lt;P&gt;not both.&lt;/P&gt;&lt;P&gt;Also, if your program is OpenMP, you generally should link with the &lt;EM&gt;serial&lt;/EM&gt; version of MKL.&lt;/P&gt;&lt;P&gt;The parallel version of MKL internally uses OpenMP. Each application thread (app using OpenMP) calling the parallel version of MKL will result in MKL instantiating an independent OpenMP thread pool for the calling thread. Thus causing an over-subscription of n**2.&lt;/P&gt;&lt;P&gt;There are cases where when MKL is only called from the serial portion of the application that you may then wish to use the parallel version of MKL. In those cases, experiment with setting the environment variable KMP_BLOCKTIME=0.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 14:22:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229062#M152707</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-17T14:22:07Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229131#M152719</link>
      <description>&lt;P&gt;I have been experimenting with many combinations of options, including turning off /Qparallel, MKL lib parallel to sequential, matmul on/off, prefetch insertion level 3 on/off, /Od, /O2, /O3, runtime checking on/off. The program is large with over 100 subprograms. It also contains lots of stochastic processes which makes timing inconsistent. Example: t= 47, 57, 46, 47 sec.&lt;/P&gt;&lt;P&gt;With that said the best results I got were using the settings from yesterday with both /Qparallel and /openmp, and MKL parallel. I parallelized certain sections of code using OpenMP, coarse-grain where I could. I know I could do more, with more programming time. Would it be beneficial for Intel to enhance the compiler so that it determines the extent of OpenMP sections during semantic analysis, and then blocks those sections off while it auto-parallelizes the rest of the program?&lt;/P&gt;&lt;P&gt;I'm still not experiencing the speed up I'm expecting going from 8-cores Intel to 64-cores AMD CPUs at similar clock speeds. I know certain parts of the code are serial so I don't expect much difference there. Maybe it has to do with microcode optimizations or problems with the parallelization such as race conditions, dead locks, etc. I have the Pro edition of XE2020 with Vtune, Advisor, etc. but the tutorials with code examples are mostly not created for Fortran programmers, just C++ programmers.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 18:26:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229131#M152719</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-17T18:26:08Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229169#M152725</link>
      <description>&lt;P&gt;If by 64 core AMD system, do you mean 128 hardware threads (2 threads per core).&lt;/P&gt;&lt;P&gt;If this be the case, (unless Intel fixed this), on Windows affinity pinning is different than on Linux. Windows uses processor groups of up to 64 hardware threads. Each processor group has but a 64-bit bit mask of logical processors within that group. I haven't tested the latest version of Intel's OpenMP on Windows with a system with more than 64 logical processors. It may be the case that your application is using only one processor group and with 64 threads running in app as well as 64 threads in MKL ...&lt;STRONG&gt; in the same processor group&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;In any event, have you tried setting environment variable KMP_BLOCKTIME=0.&lt;/P&gt;&lt;P&gt;IIF (&lt;STRONG&gt;if and only if&lt;/STRONG&gt;) Windows OpenMP uses only one processor group, then as a hack (Windows only)&lt;/P&gt;&lt;P&gt;1) leave KMP_BLOCKTIME undefined (default is 200ms)&lt;BR /&gt;2) set KMP_AFFINITY=compact, and OMP_NUM_THREADS=64 (or # threads per processor group)&lt;BR /&gt;3) At program start prior to first parallel region and prior to first call to threaded MKL...&lt;BR /&gt;4) Assuming 2 processor groups (query to test # groups and # threads/group) set the main thread to the lowest numbered processor group.&lt;BR /&gt;5) Enter an OpenMP parallel region (it doesn't have to do much) and exit the region. This will establish the application's OpenMP thread pool in the lowest processor group.&lt;BR /&gt;6) Set the main thread's processor group to the next processor group.&lt;BR /&gt;7) make a call to MKL, this doesn't have to do anything meaningful except to create its OpenMP thread pool. This will establish the MKL OpenMP thread pool in the 2nd processor group. Note, you may also need to set MKL_NUM_THREADS=64 (or #threads in processor group)&lt;/P&gt;&lt;P&gt;You may want to experiment 1:7 above as well as adding &lt;LI-EMOJI id="lia_smiling-face-with-sunglasses" title=":smiling_face_with_sunglasses:"&gt;&lt;/LI-EMOJI&gt; to reset the main thread's processor group back to the first group. Note, processor groups are not necessarily 0-based, nor contiguous.&lt;/P&gt;&lt;P&gt;BTW 100 subprograms isn't large.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 19:54:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229169#M152725</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2020-11-17T19:54:05Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229189#M152728</link>
      <description>&lt;P&gt;AMD Threadripper 3990X is 64-cores, 128-threads. KMP_BLOCKTIME=0 did not improve the performance.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 21:31:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229189#M152728</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-17T21:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: What are the optimal Fortran XE2020 update 4 compiler settings for AMD Threadripper 3990x on Win</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229190#M152729</link>
      <description>&lt;P&gt;Also I have hyper-threading turned off in the BIOS. I was informed by Intel a long time ago to do that when using OpenMP.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2020 21:36:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/What-are-the-optimal-Fortran-XE2020-update-4-compiler-settings/m-p/1229190#M152729</guid>
      <dc:creator>rivkin__steve</dc:creator>
      <dc:date>2020-11-17T21:36:21Z</dc:date>
    </item>
  </channel>
</rss>

