topic How should I distribute memory for parallel direct sparse solver for cluster? in IntelĀ® oneAPI Math Kernel Library
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-should-I-distribute-memory-for-parallel-direct-sparse-solver/m-p/1156035#M27570
<P>Hi,</P><P>I'm trying to use parallel direct sparse solver for cluster. So, altering sample code (cl_solver_sym_distr_f.f), I made the parallelized code for solving large numbers of linear equations.</P><P>I hypothesize the number of process is n and the number of equations is n*k. In my code, m-th process (m:1,2,...,n) has the following information: the i-th component of right hand vector ((m-1)*k+1 <= i <= m*k) and the (a,b) component of matrix (((m-1)*k+1 <= a <= m*k, 1 <= b <= n*k).</P><P>The difference from sample code is that we distribute memory without overlap and that the size of equations is very big.</P><P>our obtained result is like this.</P><P>The number of equations is 800,000. The number of nonzero components is 15,000,000. I do not use OpenMP. So, I set OMP_NUM_THREADS=1.</P><P>The calculation time with 1 processor is 36 s.</P><P>The calculation time with 2 processor is 17 s.</P><P>The calculation time with 4 processor is 13 s.</P><P>The calculation time with 8 processor is 12 s.</P><P>Like this, I could not obtain good efficiency.</P><P>I suppose the way of distributing memory is not proper in my code. How should I distribute memory to processors?</P><P>Best,</P><P>Shigeki</P>Sun, 11 Nov 2018 02:17:00 GMTKaneko__Shigeki2018-11-11T02:17:00ZHow should I distribute memory for parallel direct sparse solver for cluster?
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-should-I-distribute-memory-for-parallel-direct-sparse-solver/m-p/1156035#M27570
<P>Hi,</P><P>I'm trying to use parallel direct sparse solver for cluster. So, altering sample code (cl_solver_sym_distr_f.f), I made the parallelized code for solving large numbers of linear equations.</P><P>I hypothesize the number of process is n and the number of equations is n*k. In my code, m-th process (m:1,2,...,n) has the following information: the i-th component of right hand vector ((m-1)*k+1 <= i <= m*k) and the (a,b) component of matrix (((m-1)*k+1 <= a <= m*k, 1 <= b <= n*k).</P><P>The difference from sample code is that we distribute memory without overlap and that the size of equations is very big.</P><P>our obtained result is like this.</P><P>The number of equations is 800,000. The number of nonzero components is 15,000,000. I do not use OpenMP. So, I set OMP_NUM_THREADS=1.</P><P>The calculation time with 1 processor is 36 s.</P><P>The calculation time with 2 processor is 17 s.</P><P>The calculation time with 4 processor is 13 s.</P><P>The calculation time with 8 processor is 12 s.</P><P>Like this, I could not obtain good efficiency.</P><P>I suppose the way of distributing memory is not proper in my code. How should I distribute memory to processors?</P><P>Best,</P><P>Shigeki</P>Sun, 11 Nov 2018 02:17:00 GMThttps://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-should-I-distribute-memory-for-parallel-direct-sparse-solver/m-p/1156035#M27570Kaneko__Shigeki2018-11-11T02:17:00Z