- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir,
I am getting the following results
when compiling In debug win32 mode
number1=omp_get_max_threads()
call mkl_set_num_threads( number2 )
result
number1=2400(or some other number)
number2=16
when compiling In release x64 mode
number1=omp_get_max_threads()
call mkl_set_num_threads( number2 )
result
number1=0
number2=16
I need to set the stack reserve size =21285000
other wise i am getting error as stack over flow.
In debug mode i am getting error for large problems.
please help to make the code run faster so it uses all the 16 processors.
When i start the code i can see the 100% cpu usage(due to parallelization) and When it enters into pardiso subroutine it shows 50 % cpu usage
Please help so that all the 16 processors are working in pardiso subroutine.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ahmed,
actually, we recommend to use mkl_get_max_threads()instead of mp_get_max_threads() you used, because of
1)Intel MKL threading controls take precedence over the OpenMP techniques and
2) you don't need to include omp header file #include in your application.
but in any case if all things will done by properly way the results should be the same.
For example: please try to do something like the code below and see what you will have:
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ahmed,
You don't need to set explicitly the stack size like you did in this case.
What is your task size? I mean the number of equations, nnz? How did you allocate the working arrays ( a, ja, ia)?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My system is xeon E5520 at 2.27 ghz withocta processors,24 Gb RAM
64 Bit os
when i check the processors on the device manager i can find16 processors.
The arrays a,ja,ia are allocatable arrays intialized in the begining of the code.
Presently i am solving about 600000 equation with 150,00000 nonzeros.
Still i need to increase the number of equations.
it is taking 5 minutes for each iteration which i have to do it for many times
While the program is running i can see only 50 % of cpu usage
with only 8 slots running in the task manager.
How to make it 100%.
If i dont set the stack size i am getting the stack overflow error.
***********************subroutine used *******************
******************************pardiso subroutine*******************************
subroutine mklpardiso(a,ja,ia,b,nc,n)
IMPLICIT NONE
include 'mkl_pardiso.f77'
INTEGER*8 pt(64)
C.. All other variables
INTEGER maxfct, mnum,nc,mtype, phase, n, nrhs, error, msglvl
INTEGER iparm(64)
INTEGER ia(n+1)
INTEGER ja(nc)
REAL*8 a(nc)
REAL*8 b(n)
REAL*8 x(n)
INTEGER i, idum
REAL*8 waltime1, waltime2, ddum
C.. Fill all arrays containing matrix data.
DATA nrhs /1/, maxfct /1/, mnum /1/
do i = 1, 64
iparm(i) = 0
end do
iparm(1) = 1 ! no solver default
iparm(2) = 3 ! fill-in reordering from METIS openmp=3
iparm(3) = 16 ! numbers of processors
iparm(4) = 0 ! no iterative-direct algorithm
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! =0 solution on the first n compoments of x
iparm(7) = 0 ! not in use
iparm(8) = 9 ! numbers of iterative refinement steps
iparm(9) = 0 ! not in use
iparm(10) = 13 ! perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 0 ! maximum weighted matching algorithm is
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! not in use
iparm(16) = 0 ! not in use
iparm(17) = 0 ! not in use
iparm(18) = -1 ! Output: number of nonzeros in the factor LU
iparm(19) = -1 ! Output: Mflops for LU factorization
iparm(20) = 0 ! Output: Numbers of CG Iterations
iparm(60) =0
error = 0 ! initialize error flag
msglvl = 1 ! print statistical information
mtype = 2 ! symmetric, indefinite
phase = 11 ! only reordering and symbolic factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)
.
phase = 22 ! only factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)
iparm(8) = 2 ! max numbers of iterative refinement steps
phase = 33 ! only factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, b, x, error)
b=x
phase = -1 ! release internal memory
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)
return
END
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ahmed,
Your system has 16 logical processors, but only 8 physical cores due to Hyper-Threading. So, MKL decides that it's more optimal to run the code with 8 threads, not 16. I think your program already works in optimal conditions.
However, if you would like to set exactly 16 threads to compareperformance, please set envinronment variable MKL_NUM_THREADS=16, or call mkl_set_num_threads(16) into your program.
Please, note: iparm(3) is not used in current version of MKL PARDISO for setting a number of threads.
Best regards,
Konstantin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ahmed,
The task that you are solving is pretty big (Quote " Presently i am solving about 600000 equation with 150,00000 nonzeros.") therefore to use static allocation is not good idea.
Could you please try to allocate all working arrays by dynamically instead of static:
ALLOCATE( ja( nnonzeros ), a( nnonzeros ), b( n ), ia( n + 1 ), x( n ), r( n ))
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir,
I haveused
call mkl_set_num_threads(16)
but still i get a statement inthe result as
parallel direct factorization with processors: > 8
the arrays a, b,ia,ja,x are already dynamically allocated in the beginning of the code.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page