- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:
16 cpus - 84 seconds
32 cpus - 44 seconds
48 cpus - 48 seconds ?!
The factorization takes longer with 48 cpus compared to 32 cpus.
I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how to narrow down the problem?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I created an example file that can reproduce the issue. Download cl_solver_sym_sp_0_based_c.c from here:
https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0
Edit all the occurences of *.txt to the path where the files are on your system.
ia, ja, a and b data in text files are all here:
https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0
Curious what kind of performance improvement you get when running with MPI on 16, 32, 48, and 72 cpus!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm moving this to the Intel® Math Kernel Library forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The forum admin can delete this topic since I already posted here the same topic a few weeks ago.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page