- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.
Thank you very much in advance
Idefix
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.
Thank you very much in advance
Idefix
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Idefix
Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.
Thank you very much in advance
Idefix
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.
Thank you very much in advance
Idefix
Hi Idefix,
As I understand your program is enough complicated and uses many different parts of MKL. Could you measured what time your code spent on computing precondition (ILU(0)), on matrix multiplication, and calling CG subroutines on different machines. With these data we could understand situation more deeply.
With best regards,
Alexander
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page