I'm trying to use parallel direct sparse solver for cluster. So, altering sample code (cl_solver_sym_distr_f.f), I made the parallelized code for solving large numbers of linear equations.
I hypothesize the number of process is n and the number of equations is n*k. In my code, m-th process (m:1,2,...,n) has the following information: the i-th component of right hand vector ((m-1)*k+1 <= i <= m*k) and the (a,b) component of matrix (((m-1)*k+1 <= a <= m*k, 1 <= b <= n*k).
The difference from sample code is that we distribute memory without overlap and that the size of equations is very big.
our obtained result is like this.
The number of equations is 800,000. The number of nonzero components is 15,000,000. I do not use OpenMP. So, I set OMP_NUM_THREADS=1.
The calculation time with 1 processor is 36 s.
The calculation time with 2 processor is 17 s.
The calculation time with 4 processor is 13 s.
The calculation time with 8 processor is 12 s.
Like this, I could not obtain good efficiency.
I suppose the way of distributing memory is not proper in my code. How should I distribute memory to processors?