<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130691#M25577</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Thanks for the response.&lt;/P&gt;

&lt;P&gt;I would use the information to understand the processing time! DSS has two passes through a triangular matrix of (unknown) density. An alternative to dss would be a pcg, and from knowing the number of nnz coefficients in the factor I could make a simple calculation how many pcg iterations I could afford until the flops break even.&lt;/P&gt;

&lt;P&gt;"Non-constructable" means that it cannot be build because of it's dense nature. If i get the numbers correct 2.5Mio x 2.5Mio will need 50 terabyte if constructed. However, because K is a sort of an autoregressive matrix, K&lt;SUP&gt;-1&lt;/SUP&gt; can be built because it is sparse.&lt;/P&gt;

&lt;P&gt;K is constant.&lt;/P&gt;

&lt;P&gt;I have not worked with pardiso yet, but understood that the difference between pardiso and dss is merely the interface. Both will eventually use the same routines.&lt;/P&gt;

&lt;P&gt;I use the standard sequence of calls: mkl_dss_create, mkl_dss_reorder, mkl_dss_factorize and mkl_dss_solve. I am happy with the processing time of the first three, where the factorization takes about 90 real time seconds.&lt;/P&gt;

&lt;P&gt;Cheers&lt;/P&gt;</description>
    <pubDate>Sun, 27 May 2018 12:38:32 GMT</pubDate>
    <dc:creator>may_ka</dc:creator>
    <dc:date>2018-05-27T12:38:32Z</dc:date>
    <item>
      <title>NNZ coefficients in dss/paridso LU factor</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130689#M25575</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I was wondering whether there is any way to obtain the number of non-zero coefficients of the matrix factor generated by the MKL function mkl_dss_real? I checked the MKL manual but there seems to be no way to get this number.&lt;/P&gt;

&lt;P&gt;Background:&lt;/P&gt;

&lt;P&gt;I want to multiply a vector v with a matrix K&lt;SUP&gt;-1&lt;/SUP&gt; , b=K&lt;SUP&gt;-1&lt;/SUP&gt;v, where K is sparse and K&lt;SUP&gt;-1&lt;/SUP&gt; is not constructable. A way to obtain b=K&lt;SUP&gt;-1&lt;/SUP&gt;v is to solve iteratively Kb=v for which I use the mkl_dss solver. In the special setting K is of dimension 2.5Mio x 2.5Mio, is symmetric and positive definite and has 14Mio NNZ coefficients. I understood that the dss_solver uses a LU factorization and subsequently foreward-backward substitution for solving. I also understood that the time complexity of forward/backward substitution is 2o(n&lt;SUP&gt;2&lt;/SUP&gt;). Given the number of NNZ coefficients in K I could make a rough approximation of the number of floating point operations in routine "mkl_dss_solve" and the associated processing time. However, "mkl_dss_solve" needed much more (~x100) processing time. Currenly the only explanation for this observations is that the nnz coefficients in L/U must be much larger than in K.&lt;/P&gt;

&lt;P&gt;Any suggestions are welcomed.&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sun, 27 May 2018 05:19:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130689#M25575</guid>
      <dc:creator>may_ka</dc:creator>
      <dc:date>2018-05-27T05:19:46Z</dc:date>
    </item>
    <item>
      <title>Supposing that the number of</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130690#M25576</link>
      <description>&lt;P&gt;Supposing that the number of non-zero entries in the matrix factors were available, what would you do with that information? As far as I can see, nothing!&lt;/P&gt;

&lt;P&gt;I do not understand what you mean by "not constructable" and why iterations are needed. Does K change from iteration to iteration? Why do you use the DSS wrapper instead of the direct Pardiso interface?&lt;/P&gt;

&lt;P&gt;If you share details regarding the nature of the matrix, and the sequence of DSS calls that you presently use, we could perhaps suggest ways of improving the computational throughput.&lt;/P&gt;</description>
      <pubDate>Sun, 27 May 2018 11:12:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130690#M25576</guid>
      <dc:creator>mecej4</dc:creator>
      <dc:date>2018-05-27T11:12:16Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130691#M25577</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Thanks for the response.&lt;/P&gt;

&lt;P&gt;I would use the information to understand the processing time! DSS has two passes through a triangular matrix of (unknown) density. An alternative to dss would be a pcg, and from knowing the number of nnz coefficients in the factor I could make a simple calculation how many pcg iterations I could afford until the flops break even.&lt;/P&gt;

&lt;P&gt;"Non-constructable" means that it cannot be build because of it's dense nature. If i get the numbers correct 2.5Mio x 2.5Mio will need 50 terabyte if constructed. However, because K is a sort of an autoregressive matrix, K&lt;SUP&gt;-1&lt;/SUP&gt; can be built because it is sparse.&lt;/P&gt;

&lt;P&gt;K is constant.&lt;/P&gt;

&lt;P&gt;I have not worked with pardiso yet, but understood that the difference between pardiso and dss is merely the interface. Both will eventually use the same routines.&lt;/P&gt;

&lt;P&gt;I use the standard sequence of calls: mkl_dss_create, mkl_dss_reorder, mkl_dss_factorize and mkl_dss_solve. I am happy with the processing time of the first three, where the factorization takes about 90 real time seconds.&lt;/P&gt;

&lt;P&gt;Cheers&lt;/P&gt;</description>
      <pubDate>Sun, 27 May 2018 12:38:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130691#M25577</guid>
      <dc:creator>may_ka</dc:creator>
      <dc:date>2018-05-27T12:38:32Z</dc:date>
    </item>
    <item>
      <title>Quote:may.ka wrote:</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130692#M25578</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;may.ka wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I have not worked with pardiso yet, but understood that the difference between pardiso and dss is merely the interface. Both will eventually use the same routines.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;That is not fully correct. MKL Pardiso interface allow you to use 2-level factorization, VBSR format, merge forward step and factorization and other tricks that can improve overall performance. Also MKL Pardiso provide number of nonzero elements in LU decomposition. &amp;nbsp;That's why previous comment sounds reasonable - to switch from dss to pardiso.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Thu, 31 May 2018 15:07:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/NNZ-coefficients-in-dss-paridso-LU-factor/m-p/1130692#M25578</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2018-05-31T15:07:58Z</dc:date>
    </item>
  </channel>
</rss>

