<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic UPDATE: PARDISO memory consumption for unsymmetric complex problem in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/UPDATE-PARDISO-memory-consumption-for-unsymmetric-complex/m-p/1440103#M34023</link>
    <description>&lt;DIV class="lia-quilt-column lia-quilt-column-24 lia-quilt-column-single lia-quilt-column-message-body-content"&gt;
&lt;DIV class="lia-quilt-column-alley lia-quilt-column-alley-single"&gt;
&lt;DIV&gt;
&lt;DIV id="bodyDisplay" class="lia-message-body lia-component-message-view-widget-body lia-component-body-signature-highlight-escalation lia-component-message-view-widget-body-signature-highlight-escalation section_selectors question_selectors first_st_section" data-section_field_id="9f3d-2863-4c65"&gt;
&lt;DIV class="lia-message-body-content sub_section_element_selectors"&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Since I am unable to comment/reply/edit my previous post, I am starting a duplicate with updated info at the bottom of the post. I am sorry, but I couldn't think of any other alternative.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;STRONG&gt;&lt;SPAN class="sub_section_element_selectors"&gt;ORIGINAL POST:&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I am trying to use PARDISO for solving a structurally symmetric complex matrix generated by a FEM scheme, and I am quite confused by its memory requirements.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;When running PARDISO with a 144k equations matrix, I see a memory consumption up to 3GB in the factorization step. If I disable the permutation, by setting perm[i]=i in the perm array and iparm[4] = 1, it goes up to 5GB (I will use C++ 0-based indexing in this post as to avoid confusion).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I find this behavior to be a bit surprising, given that for symmetric real problems I normally see a negligible memory consumption for matrices around the same size.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Attached you can see the sparsity pattern of the input matrix&lt;/SPAN&gt;&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="FranciscoOrlandini_0-1671535848630.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/36357iECE5FEA1DC0FD6EE/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="FranciscoOrlandini_0-1671535848630.png" alt="FranciscoOrlandini_0-1671535848630.png" /&gt;&lt;/span&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;with the red color denoting non-zero positions (each block actually corresponds to ~15 equations).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;With iparm[4] = 2 I was able to inspect the matrix after PARDISO's reordering, and its sparsity pattern is as follows&lt;/SPAN&gt;&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="FranciscoOrlandini_1-1671535848627.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/36358iB0F3185D154630C2/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="FranciscoOrlandini_1-1671535848627.png" alt="FranciscoOrlandini_1-1671535848627.png" /&gt;&lt;/span&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Is this memory consumption considered normal? This matrix is obtained from a really coarse mesh, so for any practical application I wouldn't be able to use PARDISO if that is the case (perhaps with OOC mode, with I wouldn't expected to be needed for systems this big).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I first had this results using 32bit interface of oneAPI MKL 2021, and I didn't get any different results by using the 64bit interface of both 2021 and 2023 MKLs. All the tests were performed in a C++ code compiled with gcc in a Linux environment.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Unfortunately I cannot post here an easy way to generate such results, as it would require to download and compile a C++ library.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;If there is further information that I could provide in order to provide more insight to this problem, I would be really happy to do so.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Thank you in advance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;STRONG&gt;UPDATE:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;I have managed to isolate the problem in a single .cpp file and three text files containing the matrix in the CSR format.&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;BR /&gt;The files can be obtained in &lt;A href="http://%20https://drive.google.com/drive/folders/1dkpG4m8jGfAT4rmK7rURgbeabwDMgBDi?usp=share_link" target="_self"&gt;this Google Drive link&lt;/A&gt; , and below one can see PARDISO's output.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Best regards,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Francisco&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="none"&gt;=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  47 %  48 %  49 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  61 %  62 %  63 %  64 %  65 %  67 %  69 %  71 %  73 %  75 %  77 %  78 %  79 %  80 %  81 %  82 %  85 %  88 %  90 %  92 %  93 %  95 %  96 %  97 %  98 %  99 %  100 % 

=== PARDISO: solving a complex structurally symmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON


Summary: ( starting phase is reordering, ending phase is factorization )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.038899 s
Time spent in reordering of the initial matrix (reorder)         : 0.719669 s
Time spent in symbolic factorization (symbfct)                   : 0.164026 s
Time spent in data preparations for factorization (parlist)      : 0.004884 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 15.284947 s
Time spent in allocation of internal data structures (malloc)    : 0.024751 s
Time spent in additional calculations                            : 0.305008 s
Total time spent                                                 : 16.542184 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

&amp;lt; Linear system Ax = b &amp;gt;
             number of equations:           144657
             number of non-zeros in A:      8811657
             number of non-zeros in A (%): 0.042109

             number of right-hand sides:    1

&amp;lt; Factors L and U &amp;gt;
             number of columns for each panel: 72
             number of independent subgraphs:  0
&amp;lt; Preprocessing with state of the art partitioning metis&amp;gt;
             number of supernodes:                    16666
             size of largest supernode:               4251
             number of non-zeros in L:                77945805
             number of non-zeros in U:                73504188
             number of non-zeros in L+U:              151449993
             gflop   for the numerical factorization: 1100.816559

             gflop/s for the numerical factorization: 72.019651
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Tue, 20 Dec 2022 11:33:42 GMT</pubDate>
    <dc:creator>FranciscoOrlandini</dc:creator>
    <dc:date>2022-12-20T11:33:42Z</dc:date>
    <item>
      <title>UPDATE: PARDISO memory consumption for unsymmetric complex problem</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/UPDATE-PARDISO-memory-consumption-for-unsymmetric-complex/m-p/1440103#M34023</link>
      <description>&lt;DIV class="lia-quilt-column lia-quilt-column-24 lia-quilt-column-single lia-quilt-column-message-body-content"&gt;
&lt;DIV class="lia-quilt-column-alley lia-quilt-column-alley-single"&gt;
&lt;DIV&gt;
&lt;DIV id="bodyDisplay" class="lia-message-body lia-component-message-view-widget-body lia-component-body-signature-highlight-escalation lia-component-message-view-widget-body-signature-highlight-escalation section_selectors question_selectors first_st_section" data-section_field_id="9f3d-2863-4c65"&gt;
&lt;DIV class="lia-message-body-content sub_section_element_selectors"&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Since I am unable to comment/reply/edit my previous post, I am starting a duplicate with updated info at the bottom of the post. I am sorry, but I couldn't think of any other alternative.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;STRONG&gt;&lt;SPAN class="sub_section_element_selectors"&gt;ORIGINAL POST:&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I am trying to use PARDISO for solving a structurally symmetric complex matrix generated by a FEM scheme, and I am quite confused by its memory requirements.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;When running PARDISO with a 144k equations matrix, I see a memory consumption up to 3GB in the factorization step. If I disable the permutation, by setting perm[i]=i in the perm array and iparm[4] = 1, it goes up to 5GB (I will use C++ 0-based indexing in this post as to avoid confusion).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I find this behavior to be a bit surprising, given that for symmetric real problems I normally see a negligible memory consumption for matrices around the same size.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Attached you can see the sparsity pattern of the input matrix&lt;/SPAN&gt;&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="FranciscoOrlandini_0-1671535848630.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/36357iECE5FEA1DC0FD6EE/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="FranciscoOrlandini_0-1671535848630.png" alt="FranciscoOrlandini_0-1671535848630.png" /&gt;&lt;/span&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;with the red color denoting non-zero positions (each block actually corresponds to ~15 equations).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;With iparm[4] = 2 I was able to inspect the matrix after PARDISO's reordering, and its sparsity pattern is as follows&lt;/SPAN&gt;&lt;/P&gt;
&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="FranciscoOrlandini_1-1671535848627.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/36358iB0F3185D154630C2/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="FranciscoOrlandini_1-1671535848627.png" alt="FranciscoOrlandini_1-1671535848627.png" /&gt;&lt;/span&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Is this memory consumption considered normal? This matrix is obtained from a really coarse mesh, so for any practical application I wouldn't be able to use PARDISO if that is the case (perhaps with OOC mode, with I wouldn't expected to be needed for systems this big).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;I first had this results using 32bit interface of oneAPI MKL 2021, and I didn't get any different results by using the 64bit interface of both 2021 and 2023 MKLs. All the tests were performed in a C++ code compiled with gcc in a Linux environment.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Unfortunately I cannot post here an easy way to generate such results, as it would require to download and compile a C++ library.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;If there is further information that I could provide in order to provide more insight to this problem, I would be really happy to do so.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Thank you in advance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;STRONG&gt;UPDATE:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;DIV&gt;I have managed to isolate the problem in a single .cpp file and three text files containing the matrix in the CSR format.&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;BR /&gt;The files can be obtained in &lt;A href="http://%20https://drive.google.com/drive/folders/1dkpG4m8jGfAT4rmK7rURgbeabwDMgBDi?usp=share_link" target="_self"&gt;this Google Drive link&lt;/A&gt; , and below one can see PARDISO's output.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Best regards,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Francisco&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="none"&gt;=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  47 %  48 %  49 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  61 %  62 %  63 %  64 %  65 %  67 %  69 %  71 %  73 %  75 %  77 %  78 %  79 %  80 %  81 %  82 %  85 %  88 %  90 %  92 %  93 %  95 %  96 %  97 %  98 %  99 %  100 % 

=== PARDISO: solving a complex structurally symmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON


Summary: ( starting phase is reordering, ending phase is factorization )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.038899 s
Time spent in reordering of the initial matrix (reorder)         : 0.719669 s
Time spent in symbolic factorization (symbfct)                   : 0.164026 s
Time spent in data preparations for factorization (parlist)      : 0.004884 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 15.284947 s
Time spent in allocation of internal data structures (malloc)    : 0.024751 s
Time spent in additional calculations                            : 0.305008 s
Total time spent                                                 : 16.542184 s

Statistics:
===========
Parallel Direct Factorization is running on 6 OpenMP

&amp;lt; Linear system Ax = b &amp;gt;
             number of equations:           144657
             number of non-zeros in A:      8811657
             number of non-zeros in A (%): 0.042109

             number of right-hand sides:    1

&amp;lt; Factors L and U &amp;gt;
             number of columns for each panel: 72
             number of independent subgraphs:  0
&amp;lt; Preprocessing with state of the art partitioning metis&amp;gt;
             number of supernodes:                    16666
             size of largest supernode:               4251
             number of non-zeros in L:                77945805
             number of non-zeros in U:                73504188
             number of non-zeros in L+U:              151449993
             gflop   for the numerical factorization: 1100.816559

             gflop/s for the numerical factorization: 72.019651
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 20 Dec 2022 11:33:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/UPDATE-PARDISO-memory-consumption-for-unsymmetric-complex/m-p/1440103#M34023</guid>
      <dc:creator>FranciscoOrlandini</dc:creator>
      <dc:date>2022-12-20T11:33:42Z</dc:date>
    </item>
  </channel>
</rss>

