- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We want to use Cluster Pardiso for our finite element application. To get an estimate of performances, we used a simple code (attached file) to read a matrix from the Sparse Suite Collection (Matrix Market format) and then measure execution time for each phase (11, 22 and 33).
Factorisation (22) phase shows good scale up with MPI and OpenMP parallelization but solving phase (33) performances are not nearly as good as factorisation.
For example, the table below shows running times (in seconds) for differents combination of MPI processes and OMP threads (by process).
Serena.mtx:
MPI / OMP Phase=22 Phase=33
2 / 2 408.70 2.4668
2 / 4 249.52 1.3382
2 / 8 234.87 3.7524
2 / 16 93.879 1.3181
4 / 2 327.69 1.8661
4 / 4 162.16 1.9664
4 / 8 96.526 4.4899
4 / 16 58.619 1.3763
8 / 2 175.61 1.1638
8 / 4 90.975 1.1006
8 / 8 67.704 2.4264
8 / 16 39.654 0.9049
16 / 2 127.61 1.4321
16 / 4 62.155 0.9136
16 / 8 53.761 2.0407
16 / 16 26.957 0.7122
32 / 8 36.447 2.1856
32 / 16 24.977 0.3729
We can observe that solving does not always decrease with more MPI process or OpenMP threads. We tested other matrices (RM07R) but the same behaviour was observed. Is this normal or is it an issue? Is there a way to get better scaling?
Thanks a lot for any advice
Guillaume
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guillaume,
This might be the scalability issue at the solution stage.
What version of MKL do you run?
thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We used MKL 2017.0.4 for 64 architecture
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you had a chance to take version 2019 and check the scalability with this version of mkl? You may take the latest update for free.
if not, then we will check these numbers on our side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our application runs on Graham, a cluster at Compute Canada (https://docs.computecanada.ca/wiki/Graham). At the moment, the most recent version available is MKL 2018.0.3 and a request to install MKL 2019 could take a while before it is processed.
It seems that switching to MKL 2018 does not solve the problem. Here are the results for Serena.mtx (with MKL 2018).
MPI / OMP Phase=22 Phase=33
2 / 2 431.21 2.4546
2 / 4 266.05 1.5849
2 / 8 266.05 1.7487
2 / 16 83.262 1.3433
4 / 2 245.56 1.4342
4 / 4 134.69 1.3420
4 / 8 87.154 2.0223
4 / 16 54.010 2.1256
8 / 2 152.77 1.0543
8 / 4 89.457 1.0726
8 / 8 60.434 2.1468
8 / 16 38.522 1.0357
16 / 2 110.03 1.5032
16 / 4 55.077 0.5922
16 / 8 40.506 0.9916
16 / 16 28.883 0.7282
32 / 8 34.848 0.7771
32 / 16 27.193 0.4550
I would appreciate if you could check these results.
Thank you!
Guillaume
PS: Our application is compiled with these flags
-O3 -qopenmp -mkl=parallel -std=c++11 -Wall
-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl -lmpi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sure, will try to check and get you back
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here what I obtained with OMP threads only due to some cluster access problem.
RM07M. The scalability is too small.
MKL version 2019 u1.
OMP threads phase == 22 (sec) phase == 33 (sec) 1 973.9 5.04 2 567.57 3.01 4 313.77 2.37 8 192.01 1.94 16 124.3 1.76 32 107.69 1.72

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page