The forum admin can delete

Ferris_H_ · ‎11-15-2016

My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:

16 cpus - 84 seconds

32 cpus - 44 seconds

48 cpus - 48 seconds ?!

The factorization takes longer with 48 cpus compared to 32 cpus.

I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how to narrow down the problem?

Ferris_H_ · ‎11-21-2016

I created an example file that can reproduce the issue. Download cl_solver_sym_sp_0_based_c.c from here:

https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0

Edit all the occurences of *.txt to the path where the files are on your system.

ia, ja, a and b data in text files are all here:

https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0

Curious what kind of performance improvement you get when running with MPI on 16, 32, 48, and 72 cpus!

James_T_Intel · ‎11-23-2016

I'm moving this to the Intel® Math Kernel Library forum.

Ferris_H_ · ‎11-23-2016

The forum admin can delete this topic since I already posted here the same topic a few weeks ago.

No speedup of cluster_sparse_solver beyond 32 cpus