<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323085#M8840</link>
    <description>&lt;P&gt;Hello Gennady,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your quick answer.&lt;/P&gt;
&lt;P&gt;Your first option didn't avoid to get 1TB RAM to be full and error &lt;CODE&gt;bad alloc&lt;/CODE&gt; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For your second suggestion, there a few examples on the web for pdgetri and pdgetf routines.&lt;/P&gt;
&lt;P&gt;How to pass from :&lt;/P&gt;
&lt;P&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;// LAPACKE routines&lt;BR /&gt;MKL_INT info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);&lt;BR /&gt;MKL_INT info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;to pdgetri and pdgetrf ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If someone could help me to implement these new routines for me, I didn't understand all the required arguments.&lt;/P&gt;
&lt;P&gt;Best regards, chris&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 19 Oct 2021 10:08:33 GMT</pubDate>
    <dc:creator>chris6</dc:creator>
    <dc:date>2021-10-19T10:08:33Z</dc:date>
    <item>
      <title>LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1322847#M8837</link>
      <description>&lt;DIV class="s-prose js-post-body"&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In C++, I need a function which enables the inversion on large matrixes (at the maximum, I need to inverse &lt;CODE&gt;120,000 x 120,000&lt;/CODE&gt; size matrix).&lt;/P&gt;
&lt;P&gt;I am trying with the workstation at work to perform this with &lt;CODE&gt;LAPACKE&lt;/CODE&gt; &lt;CODE&gt;dgetri&lt;/CODE&gt; and &lt;CODE&gt;dgetrf&lt;/CODE&gt; routines with &lt;CODE&gt;Intel OneAPI&lt;/CODE&gt; framework.&lt;/P&gt;
&lt;P&gt;The workstation has 1TB of RAM and 2 GPU cards RTX A6000 of 48GB for each one.&lt;/P&gt;
&lt;P&gt;Currently, the maximum that I can invert is roughly a matrix of &lt;CODE&gt;50,000 x 50,000&lt;/CODE&gt;. Over this size, &lt;CODE&gt;the 1TB RAM if full&lt;/CODE&gt; and I get the following error :&lt;/P&gt;
&lt;PRE class="lang-cpp s-code-block"&gt;&lt;CODE class="hljs language-cpp"&gt;terminate called after throwing an instance of &lt;SPAN class="hljs-string"&gt;'std::bad_alloc'&lt;/SPAN&gt;
  &lt;SPAN class="hljs-built_in"&gt;what&lt;/SPAN&gt;():  std::bad_alloc
Command terminated by signal &lt;SPAN class="hljs-number"&gt;6&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here the implemented function which inverses a matrix :&lt;/P&gt;
&lt;PRE class="lang-cpp s-code-block"&gt;&lt;CODE class="hljs language-cpp"&gt;&lt;SPAN class="hljs-comment"&gt;// Inversion Matrix : passing Matrixes by Reference&lt;/SPAN&gt;
&lt;SPAN class="hljs-function"&gt;&lt;SPAN class="hljs-type"&gt;void&lt;/SPAN&gt; &lt;SPAN class="hljs-title"&gt;matrix_inverse_lapack&lt;/SPAN&gt;&lt;SPAN class="hljs-params"&gt;(vector&amp;lt;vector&amp;lt;&lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt;&amp;gt;&amp;gt; &lt;SPAN class="hljs-keyword"&gt;const&lt;/SPAN&gt; &amp;amp;F_matrix, vector&amp;lt;vector&amp;lt;&lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt;&amp;gt;&amp;gt; &amp;amp;F_output)&lt;/SPAN&gt; &lt;/SPAN&gt;{

  &lt;SPAN class="hljs-comment"&gt;// Index for loop and arrays&lt;/SPAN&gt;
  &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt; i, j, ip, idx;

  &lt;SPAN class="hljs-comment"&gt;// Size of F_matrix&lt;/SPAN&gt;
  &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt; N = F_matrix.&lt;SPAN class="hljs-built_in"&gt;size&lt;/SPAN&gt;();
  cout &amp;lt;&amp;lt; &lt;SPAN class="hljs-string"&gt;"m = "&lt;/SPAN&gt; &amp;lt;&amp;lt; N &amp;lt;&amp;lt; endl;

  &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt; *IPIV = &lt;SPAN class="hljs-keyword"&gt;new&lt;/SPAN&gt; &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt;[N];

  &lt;SPAN class="hljs-comment"&gt;// Statement of main array to inverse&lt;/SPAN&gt;
  &lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt; *arr = &lt;SPAN class="hljs-keyword"&gt;new&lt;/SPAN&gt; &lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt;[N*N];

  &lt;SPAN class="hljs-comment"&gt;// Statement of returned matrix of size N&lt;/SPAN&gt;
  &lt;SPAN class="hljs-comment"&gt;//vector&amp;lt;vector&amp;lt;double&amp;gt; &amp;gt; F_final(N, vector&amp;lt;double&amp;gt;(N));   &lt;/SPAN&gt;

  &lt;SPAN class="hljs-comment"&gt;// Output Diagonal block&lt;/SPAN&gt;
  &lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt; *diag = &lt;SPAN class="hljs-keyword"&gt;new&lt;/SPAN&gt; &lt;SPAN class="hljs-type"&gt;double&lt;/SPAN&gt;[N];

  &lt;SPAN class="hljs-comment"&gt;//#pragma omp parallel for num_threads(n_threads)&lt;/SPAN&gt;
  &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; (i = &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;; i&amp;lt;N; i++){
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; (j = &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;; j&amp;lt;N; j++){
      idx = i*N + j;
      arr[idx] = F_matrix[i][j];
    }
  }

  &lt;SPAN class="hljs-comment"&gt;// LAPACKE routines&lt;/SPAN&gt;
  &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt; info1 = &lt;SPAN class="hljs-built_in"&gt;LAPACKE_dgetrf&lt;/SPAN&gt;(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
  &lt;SPAN class="hljs-type"&gt;int&lt;/SPAN&gt; info2 = &lt;SPAN class="hljs-built_in"&gt;LAPACKE_dgetri&lt;/SPAN&gt;(LAPACK_ROW_MAJOR, N, arr, N, IPIV);

  &lt;SPAN class="hljs-comment"&gt;//#pragma omp parallel for num_threads(n_threads)&lt;/SPAN&gt;
  &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; (i = &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;; i&amp;lt;N; i++){
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; (j = &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;; j&amp;lt;N; j++){
      idx = i*N + j;
      F_output[i][j] = arr[idx];
    }
  }

  &lt;SPAN class="hljs-keyword"&gt;delete&lt;/SPAN&gt;[] IPIV;
  &lt;SPAN class="hljs-keyword"&gt;delete&lt;/SPAN&gt;[] arr;
} 
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Is there a workaround with LAPACKE to be able to invert a &lt;CODE&gt;120,000 x 120,000&lt;/CODE&gt; matrix without having a &lt;CODE&gt;bad alloc&lt;/CODE&gt; error, even if the RAM of workstation has 1TB?&lt;/P&gt;
&lt;P&gt;PS: I have also tried to use MAGMA for GPU cards but I am also limited to &lt;CODE&gt;40,000 x 40,000&lt;/CODE&gt; matrix size for one GPU. I couldn't have for the moment found a way to combine the powerful in the same time of both GPU.&lt;/P&gt;
&lt;H4&gt;EDIT :&lt;/H4&gt;
&lt;P&gt;Is there a way to pass by reference the F_matrix input variable as arguments for &lt;CODE&gt;LAPACKE dgetrf&lt;/CODE&gt; and &lt;CODE&gt;degetri&lt;/CODE&gt;? I remind that F_matrix has a type &lt;CODE&gt;vector&amp;lt;vector&amp;lt;double&amp;gt;&amp;gt;&lt;/CODE&gt;.&lt;/P&gt;
&lt;P&gt;Thanks in advance for your help.&lt;/P&gt;
&lt;P&gt;Best regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 18 Oct 2021 14:10:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1322847#M8837</guid>
      <dc:creator>chris6</dc:creator>
      <dc:date>2021-10-18T14:10:59Z</dc:date>
    </item>
    <item>
      <title>Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323028#M8838</link>
      <description>&lt;P&gt;Christophe, &lt;/P&gt;&lt;P&gt;please try to link against MKL ILP64 API. Check what MKL Linker Adviser (&lt;A href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl/link-line-advisor.html" target="_blank"&gt;https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl/link-line-advisor.html&lt;/A&gt;) suggests to link with. In that case, don't forget to change all integers into your code as MKL_INT types. &lt;/P&gt;&lt;P&gt;the second way to solve such huge problems - use the distributed versions of these routines ( p?getri and p?getrf correspondingly) to run across many computing nodes. But in that case, You have to change the int datatypes to MKL_INT as well and link against ILP64 mkl's libraries as well.&lt;/P&gt;&lt;P&gt;-Gennady&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 19 Oct 2021 03:45:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323028#M8838</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-19T03:45:27Z</dc:date>
    </item>
    <item>
      <title>Re: Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323085#M8840</link>
      <description>&lt;P&gt;Hello Gennady,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your quick answer.&lt;/P&gt;
&lt;P&gt;Your first option didn't avoid to get 1TB RAM to be full and error &lt;CODE&gt;bad alloc&lt;/CODE&gt; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For your second suggestion, there a few examples on the web for pdgetri and pdgetf routines.&lt;/P&gt;
&lt;P&gt;How to pass from :&lt;/P&gt;
&lt;P&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;// LAPACKE routines&lt;BR /&gt;MKL_INT info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);&lt;BR /&gt;MKL_INT info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;to pdgetri and pdgetrf ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If someone could help me to implement these new routines for me, I didn't understand all the required arguments.&lt;/P&gt;
&lt;P&gt;Best regards, chris&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Oct 2021 10:08:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323085#M8840</guid>
      <dc:creator>chris6</dc:creator>
      <dc:date>2021-10-19T10:08:33Z</dc:date>
    </item>
    <item>
      <title>Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323196#M8842</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: -apple-system; font-size: 10pt;"&gt;MKL contains the Fortran API of PDGETRF routine example. You may find out this example in MKKLROOT/examples/scalapackf directory. There is no example of p?getri routine available.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;Meantime, originally I wrongly estimate the memory allocation sizes for 120k x 120 k matrixes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;Actually, the allocated size of double-precision 120kx120k matrix == size * size* sizeof(double) ~ 110 Gb. You could see its feet with your 1TB RAM .&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;I made the trivial dgetrf example ( see attached),&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;Built it by using -DML_ILP64 compiler option and linked against ILP64 version of mkl libraries:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;icc -DMKL_ILP64 -I/opt/intel/oneapi/mkl/2021.4.0/include test_getrf_huge.cpp \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;-Wl,--start-group \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_intel_ilp64.a \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_intel_thread.a \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_core.a \&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;-Wl,--end-group -liomp5 -lpthread -lm -ldl&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;Running this executable on the local AVX-512 based CPU with 256 Gb of RAM, plus MKL_VERBOSE mode, I see the getrf passed with Ok result. The execution time = 426 sec.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;$ ./a.out 120000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;MKL_VERBOSE oneMKL 2021.0 Update 4 Product build 20210904 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), EVEX-encoded AES and Carry-Less Multiplication Quadword instructions, Lnx 2.40GHz ilp64 intel_thread&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;MKL_VERBOSE &lt;/SPAN&gt;&lt;B style="font-size: 10pt;"&gt;DGETRF&lt;/B&gt;&lt;SPAN style="font-size: 10pt;"&gt;(&lt;/SPAN&gt;&lt;B style="font-size: 10pt;"&gt;120000,120000&lt;/B&gt;&lt;SPAN style="font-size: 10pt;"&gt;,0x150c472aa080,120000,0x1541ed9a9080,0) &lt;/SPAN&gt;&lt;B style="font-size: 10pt;"&gt;426.99s &lt;/B&gt;&lt;SPAN style="font-size: 10pt;"&gt;CNR:OFF Dyn:1 FastMM:1 TID:0&amp;nbsp;NThr:72&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt;"&gt;...LAPACKE_dgetrf &lt;/SPAN&gt;&lt;B style="font-size: 10pt;"&gt;returns 0&lt;/B&gt;&lt;SPAN style="font-size: 10pt;"&gt;, SIZE = 120000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;-Gennady&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 19 Oct 2021 17:24:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323196#M8842</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-19T17:24:41Z</dc:date>
    </item>
    <item>
      <title>Re: Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323221#M8844</link>
      <description>&lt;P&gt;I attached the test I used.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Oct 2021 18:38:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323221#M8844</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-19T18:38:47Z</dc:date>
    </item>
    <item>
      <title>Re: Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323238#M8846</link>
      <description>&lt;P&gt;Hello Gennady !&lt;/P&gt;
&lt;P&gt;Your test code worked fine on the worstation. But runtime is relatively long compared to you (roughly 1800s). I have a processor AMD EPYC 64 cores (128 threads).&lt;/P&gt;
&lt;P&gt;During the execution, I can see the 64 threads launched by the code (OMP_NUM_THREADS).&lt;/P&gt;
&lt;P&gt;But for my code, RAM becomes quickly full and reach the 1TB, and one gets error "terminate called after throwing an instance of 'std::bad_alloc' ".&lt;/P&gt;
&lt;P&gt;I think this is due to the fact I am using in other parts of the code 120k x 120k.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tha's why I would like to pass to scaLAPACK, since I think a distributed computation is necessary in my case.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But as you mentioned it, there is only an example in Fortran, not C++.&lt;/P&gt;
&lt;P&gt;Do you know by chance other sources or test codes for "pdgetrf" combined with "pdgetri" to carry out a 120k x 120k matrix inversion.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here my last code with your suggestions :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;
// Passing Matrixes by Reference
void matrix_inverse_lapack(vector&amp;lt;vector&amp;lt;double&amp;gt;&amp;gt; const &amp;amp;F_matrix, vector&amp;lt;vector&amp;lt;double&amp;gt;&amp;gt; &amp;amp;F_output) {

// Index for loop and arrays
MKL_INT i, j, ip, idx;

// Size of F_matrix
MKL_INT N = F_matrix.size();
cout &amp;lt;&amp;lt; "m = " &amp;lt;&amp;lt; N &amp;lt;&amp;lt; endl;

//MKL_INT *IPIV = new MKL_INT[N];
MKL_INT *IPIV = (MKL_INT*)mkl_calloc( N, sizeof(MKL_INT), 64);

// Statement of main array to inverse
double *arr = (double*) mkl_calloc( N*N, sizeof(double), 64);

//#pragma omp parallel for num_threads(n_threads)
for (i = 0; i&amp;lt;N; i++){
for (j = 0; j&amp;lt;N; j++){
idx = i*N + j;
arr[idx] = F_matrix[i][j];
}
}
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Compiled with intel.make :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;CXX = icpc -std=c++11 -O3 -xHost -DMKL_ILP64
CXXFLAGS = -Wall -c -I${MKLROOT}/include -I/opt/intel/oneapi/compiler/latest/linux/compiler/include -qopenmp -qmkl=parallel
LDFLAGS = -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
SOURCES = main_intel.cpp XSAF_C_intel.cpp
EXECUTABLE = main_intel.exe

&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any clues or suggestions ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Oct 2021 20:28:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323238#M8846</guid>
      <dc:creator>chris6</dc:creator>
      <dc:date>2021-10-19T20:28:55Z</dc:date>
    </item>
    <item>
      <title>Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323337#M8848</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt; I think this is due to the fact I am using in other parts of the code 120k x 120k.&lt;/P&gt;&lt;P&gt;&amp;lt;&amp;lt; that's mean the problem is outside of MKLs calls and therefore you could somehow redesign the memory consumption of this application.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; That's why I would like to pass to ScaLAPACK, since I think a distributed computation is necessary in my case.&lt;/P&gt;&lt;P&gt;&amp;lt;&amp;lt;  I have only two suggestions:  1/ the C code of pdgetrf will look very similar to Fortran counterpart. A similar is true wrt pgetri routine.  2/ you might try to search this forum to see such kinds of examples.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 20 Oct 2021 03:49:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1323337#M8848</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-20T03:49:49Z</dc:date>
    </item>
    <item>
      <title>Re:LAPACKE - dgetri and dgetrf for large matrix - avoid std::bad_alloc' error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1327124#M8898</link>
      <description>&lt;P&gt;The thread is closing and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Nov 2021 06:44:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/LAPACKE-dgetri-and-dgetrf-for-large-matrix-avoid-std-bad-alloc/m-p/1327124#M8898</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-11-04T06:44:50Z</dc:date>
    </item>
  </channel>
</rss>

