I am performing a Cholesky factorization with pdpotrf(). I am reading the whole matrix in the master node and then I distribute it. Then, every node is handling a submatrix and call pdpotrf(). Then I just send back the submatrices to the master node and compose the solution.
I am amazed by that. How does it do it? I mean what algorithm does it implement? I suspect it's block partitioning and every node is communicating (I hope not much, but I would really like to know how much).
Moreover, I feel I should understand the algorithmic part, in order to choose properly the block sizes.
Finally, I would like to know if pdpotrf() is multithreaded. For example, I read in this paper, that 4-threaded approaches do exist.
Link Copied
It seems that it's not multithreaded, since I checked with Top and the number of the appearance of my executable there, was equal to the number of processes I summoned via MPI.
It seems that it's not multithreaded, since I checked with Top and the number of the appearance of my executable there, was equal to the number of processes I summoned via MPI.
For more complete information about compiler optimizations, see our Optimization Notice.