- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:

16 cpus - 84 seconds

32 cpus - 44 seconds

48 cpus - 48 seconds ?!

The factorization takes longer with 48 cpus compared to 32 cpus.

I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how can I narrow down the bottleneck?

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Ferris, could you check the scalability with larger problem size?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Unfortunately, I do not have any larger matrixes to test . The size I am testing is around the largest I would see in my area. Are there any public benchmark matrixes I could download to test? If not , I can create an example code that reads in my matrix for you to test on your cluster.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I created an example file that can reproduce the issue. Download cl_solver_sym_sp_0_based_c.c from here:

https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0

Edit all the occurences of *.txt to the path where the files are on your system.

ia, ja, a and b data in text files are all here:

https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0

Curious what kind of performance improvement you get when running with MPI on 16, 32, 48, and 72 cpus!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Ferris, do you have access to the 64 cores system? i am currently not, if you have, could you please try and give us the results? The scalability may be different if the number of nodes will be power of 2.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gennady F. (Intel) wrote:

Ferris, do you have access to the 64 cores system? i am currently not, if you have, could you please try and give us the results? The scalability may be different if the number of nodes will be power of 2.

Hi Gennady,

As requested, I solved my model on a larger 4-node 60 core cluster with 15 cores/node each. Below are the factorization times:

15 cores - 70 seconds

30 cores - 41 seconds

45 cores - 42 seconds

60 cores - 36 seconds

So seems there is some improvement when the number of nodes is 4. But with 3 nodes it shows same solve times as 2 nodes. Does the number of nodes always have to be a power of 2? Or could there be some problem with my cluster?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page