<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The slowdown you experience in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952211#M15299</link>
    <description>The slowdown you experience could be provoked by running in gradual underflow mode (the hardware default), when individual products produce results of magnitude &amp;lt; TINY(1d0).
Assuming your main program is compiled by an Intel compiler, if you use an option such as /fp:source, you would follow by /Qftz to set abrupt underflow.  Otherwise, you may require the C/C++ ftz intrinsic to set abrupt underflow.
A Core I7-2 or -3 CPU should not exhibit the slowdown of earlier CPU models.</description>
    <pubDate>Sat, 17 Nov 2012 13:59:51 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2012-11-17T13:59:51Z</dc:date>
    <item>
      <title>performance difference of DGEMM on matrix containing huge values</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952202#M15290</link>
      <description>&lt;P&gt;Dear MKL experts,&lt;/P&gt;
&lt;P&gt;I've met an issue on matrix multiplication DGEMM routine that if the matrix cotnains huge values, i.e. 1.0d+17, the performance dropped significantly as low as 1/3 of regular scenario.&lt;/P&gt;
&lt;P&gt;Any idea if this is possible and/or what the reason could be? I really don't expect such behavior. BTW, this is encountered only in Windows but not in Linux.&lt;/P&gt;
&lt;P&gt;thanks,&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Nov 2012 08:33:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952202#M15290</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-16T08:33:55Z</dc:date>
    </item>
    <item>
      <title>Hi, thanks for the issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952203#M15291</link>
      <description>Hi, thanks for the issue. 
Is that 32 or 64 bit systems?</description>
      <pubDate>Fri, 16 Nov 2012 11:23:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952203#M15291</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-16T11:23:10Z</dc:date>
    </item>
    <item>
      <title>boreas,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952204#M15292</link>
      <description>boreas,
Did you check the output result. Are there NaN into output results? 
and what type of CPU  you are working?
Gennady</description>
      <pubDate>Fri, 16 Nov 2012 11:31:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952204#M15292</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-16T11:31:24Z</dc:date>
    </item>
    <item>
      <title>Hello Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952205#M15293</link>
      <description>Hello Gennady,

thanks for your reply. It is 64bit Windows Xp or Windows 7. no NaN in output results. 

we've reproduced on several system with intel processors

intel core i7-2760QM
intel Xeon X5550

thanks,</description>
      <pubDate>Fri, 16 Nov 2012 12:21:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952205#M15293</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-16T12:21:11Z</dc:date>
    </item>
    <item>
      <title>thanks,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952206#M15294</link>
      <description>thanks, 
what is the problem size in that case?
what is the version of MKL you use? ( I hope in the all cases threaded version of mkl have been used).
/Gennady</description>
      <pubDate>Fri, 16 Nov 2012 13:12:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952206#M15294</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-16T13:12:34Z</dc:date>
    </item>
    <item>
      <title>It would also be interesting</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952207#M15295</link>
      <description>It would also be interesting to know if there are any +/-Infs in your output.</description>
      <pubDate>Fri, 16 Nov 2012 16:22:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952207#M15295</guid>
      <dc:creator>Shane_S_Intel</dc:creator>
      <dc:date>2012-11-16T16:22:39Z</dc:date>
    </item>
    <item>
      <title>hello Gennady and Shane,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952208#M15296</link>
      <description>hello Gennady and Shane,

I use single thread, and as said, there is no NaN in output. 

there are lots of dgemm call with various sizes in my applicaiton, but a typical size is about M=3769, N = 32, K = 256</description>
      <pubDate>Sat, 17 Nov 2012 03:20:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952208#M15296</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-17T03:20:28Z</dc:date>
    </item>
    <item>
      <title>Sorry if I wasn't clear, but</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952209#M15297</link>
      <description>Sorry if I wasn't clear, but I asked about INFs instead of NaNs ... they are distinct IEEE types. With such large inputs, it would seem natural to expect that overflows may be occuring and that the INFs are generated.</description>
      <pubDate>Sat, 17 Nov 2012 05:29:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952209#M15297</guid>
      <dc:creator>Shane_S_Intel</dc:creator>
      <dc:date>2012-11-17T05:29:29Z</dc:date>
    </item>
    <item>
      <title>hello Shane,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952210#M15298</link>
      <description>hello Shane,

from visual studio debugger, I do not see any INFs. also I am not sure what is the right way to check if there is any INFs.

however, I do see lot of really tiny value, like 1.0d-69. not sure if this matters. And let me to illustrate what I was doing clearer

it is matrix factorization { A, B; B, C}, where A contains some huge diagonal values, and C = C - B*INV(A)*B = C - B*INV(L) * INV(D) * INV(L) * B, if A = LDL. I use DGEMM to calculate the last outer product. Since INV(D) is involved, those tiny values appear.

I monitored the slow down in DGEMM and thus I initiate this topic. Again, do you think those tiny value trigger anything like under flow or so? and does it matter to the performance? thank you so much.</description>
      <pubDate>Sat, 17 Nov 2012 06:07:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952210#M15298</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-17T06:07:41Z</dc:date>
    </item>
    <item>
      <title>The slowdown you experience</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952211#M15299</link>
      <description>The slowdown you experience could be provoked by running in gradual underflow mode (the hardware default), when individual products produce results of magnitude &amp;lt; TINY(1d0).
Assuming your main program is compiled by an Intel compiler, if you use an option such as /fp:source, you would follow by /Qftz to set abrupt underflow.  Otherwise, you may require the C/C++ ftz intrinsic to set abrupt underflow.
A Core I7-2 or -3 CPU should not exhibit the slowdown of earlier CPU models.</description>
      <pubDate>Sat, 17 Nov 2012 13:59:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952211#M15299</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-11-17T13:59:51Z</dc:date>
    </item>
    <item>
      <title>Hello TimP,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952212#M15300</link>
      <description>Hello TimP,

thanks for your reply.  

But, by reading the INTEL fortran compiler document (my program is in Fortran), it looks like /Qftz is the default option since my program is compiled with /Os in release mode. So it should be already on. And I saw those tiny values in debugger because the debug module is compiled with /Qd which did not trigger /Qftz. Does this make sense?

what should I try?
-ftz or /Qftz
Denormal results are flushed to zero.
Every optimization option O level, except O0, sets -ftz and /Qftz.</description>
      <pubDate>Sun, 18 Nov 2012 05:28:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952212#M15300</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-18T05:28:26Z</dc:date>
    </item>
    <item>
      <title>Yes, an ifort main program</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952213#M15301</link>
      <description>Yes, an ifort main program built with -O1 -O2 or -O3 would have /Qftz set, unless you set a /fp: option.  In the latter case, you would follow up with /Qftz to get back to abrupt underflow.
I guess ifort /Os is equivalent to /O1.  This removes some major optimizations such as auto-vectorization, which would not be needed if all your time is spent in MKL.  I was surprised by this, as -Os appears to be allowed in ifort only for Windows.
I think the sentence with "leave flags as they are" in the documentation of /Qftz is misleading.  /Qftz- does mean taking the hardware default, which is the 32-bit Windows default, but X64 Windows should set abrupt underflow before starting a .exe, so /Qftz- would generate code to set gradual underflow mode.</description>
      <pubDate>Sun, 18 Nov 2012 12:54:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952213#M15301</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-11-18T12:54:01Z</dc:date>
    </item>
    <item>
      <title>thanks, but this seems not</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952214#M15302</link>
      <description>thanks, but this seems not explain why I got slow-down on x64 windows. I have /Qs, which should trigger /Qftz by default as there is no /fp used.</description>
      <pubDate>Sun, 18 Nov 2012 15:08:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952214#M15302</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-18T15:08:49Z</dc:date>
    </item>
    <item>
      <title>Can you give us the example</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952215#M15303</link>
      <description>Can you give us the example of this case to check the problem on our side?</description>
      <pubDate>Mon, 19 Nov 2012 05:29:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952215#M15303</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-19T05:29:38Z</dc:date>
    </item>
    <item>
      <title>unfortunately, a small</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952216#M15304</link>
      <description>unfortunately, a small program can not reproduce this abnormal behavior. I'll get you once I could. thank you.</description>
      <pubDate>Tue, 20 Nov 2012 07:29:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952216#M15304</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-20T07:29:10Z</dc:date>
    </item>
    <item>
      <title>Hello Gennady, TimP, Shane,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952217#M15305</link>
      <description>Hello Gennady, TimP, Shane,

I've attached a small example with the matrix data which contains tiny values. I built the code through VS2010 and Intel 12.1 compiler. All compiler options are default. 

here the behavior I observed is not exactly the same what I saw on the large code. For this example, all computations is supposed to be on MKL dgemm and therefore, I don't expect much performance difference between my debug and release exectuables, however, I do see that,

11.6 seconds w/ debug versus 0.35 seconds w/ release exectuable.

the test is done with Intel I7-2760QM, 2.4GHz, w/ single thread

Please help. the performance variation does confuse me.

thank you.</description>
      <pubDate>Wed, 21 Nov 2012 08:20:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952217#M15305</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-21T08:20:55Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952218#M15306</link>
      <description>Hello,
Yes, I checked how it works on my side. 
The cause of the performance variation  can be explained that intel compiler  flushes denormals to zero by the default in release mode. 
in the debug mode /Qftz is off.

on my local system ( win7, SNB, MKL 11.0.1, 64 bit) I had 
release - 0.45 sec
debug   - 12.4 sec 
debug with /Qftz - 0.51 sec</description>
      <pubDate>Wed, 21 Nov 2012 10:20:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952218#M15306</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-21T10:20:20Z</dc:date>
    </item>
    <item>
      <title>can you let me know where you</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952219#M15307</link>
      <description>can you let me know where you put /Qftz? compiler option or linker option. details are appreciated.</description>
      <pubDate>Wed, 21 Nov 2012 10:47:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952219#M15307</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-21T10:47:41Z</dc:date>
    </item>
    <item>
      <title>one more question - if the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952220#M15308</link>
      <description>one more question - if the main program is with Intel C or Microsoft C, which option should I use? thank you.</description>
      <pubDate>Wed, 21 Nov 2012 10:54:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952220#M15308</guid>
      <dc:creator>boreas</dc:creator>
      <dc:date>2012-11-21T10:54:36Z</dc:date>
    </item>
    <item>
      <title>The SSE intrinsics for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952221#M15309</link>
      <description>The SSE intrinsics for switching underflow mode are covered (with typos) here: 
&lt;A href="http://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior" target="_blank"&gt;http://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior&lt;/A&gt;
Since the article was written, ifort adopted the Fortran standard method of setting underflow mode under USE ieee_arithmetic, so you can ignore the Fortran-bashing aspect of the article.
call ieee_set_underflow_mode(.false.)
Setting the initialization in main() by /Qftz/Qftz- is supported only by the Intel compilers.  It takes effect at compile time, when you build main.obj.  Qftz has no effect in the building of other .obj or at link time.
By the way, the gcc equivalent of /Qftz is normally invoked by -ffast-math.
With the SSE or language standard intrinsics, you can switch the mode at any point in your program (but don't do it inside a time-consuming loop).
The usual practice of running in IEEE standard gradual underflow mode under MSVC and gcc probably motivated the changes in corei7-2 which are supposed to speed up the common cases.</description>
      <pubDate>Wed, 21 Nov 2012 12:35:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-difference-of-DGEMM-on-matrix-containing-huge-values/m-p/952221#M15309</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-11-21T12:35:00Z</dc:date>
    </item>
  </channel>
</rss>

