Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

pardiso PROCESSORS

ahmediiit
Beginner
757 Views

Hello sir,

I am getting the following results

when compiling In debug win32 mode

number1=omp_get_max_threads()

call mkl_set_num_threads( number2 )

result

number1=2400(or some other number)

number2=16

when compiling In release x64 mode

number1=omp_get_max_threads()

call mkl_set_num_threads( number2 )

result

number1=0

number2=16

I need to set the stack reserve size =21285000

other wise i am getting error as stack over flow.

In debug mode i am getting error for large problems.

please help to make the code run faster so it uses all the 16 processors.

When i start the code i can see the 100% cpu usage(due to parallelization) and When it enters into pardiso subroutine it shows 50 % cpu usage

Please help so that all the 16 processors are working in pardiso subroutine.

0 Kudos
6 Replies
Gennady_F_Intel
Moderator
757 Views

Hello Ahmed,

actually, we recommend to use mkl_get_max_threads()instead of mp_get_max_threads() you used, because of

1)Intel MKL threading controls take precedence over the OpenMP techniques and

2) you don't need to include omp header file #include in your application.

but in any case if all things will done by properly way the results should be the same.

For example: please try to do something like the code below and see what you will have:

#include "mkl.h"
#include
int main( void ){
int number_omp =omp_get_max_threads();
printf("\n\t number_omp == %d \n", number_omp);
int number_mkl = mkl_get_max_threads();
printf("\n\t number_mkl == %d \n", number_mkl);
return 0;
}
i have on my side ( 2-core system):
number_omp == 2
number_mkl == 2
Press any key to continue . . .

--Gennady

0 Kudos
Gennady_F_Intel
Moderator
757 Views

Ahmed,

You don't need to set explicitly the stack size like you did in this case.

What is your task size? I mean the number of equations, nnz? How did you allocate the working arrays ( a, ja, ia)?

--Gennady

0 Kudos
ahmediiit
Beginner
757 Views

My system is xeon E5520 at 2.27 ghz withocta processors,24 Gb RAM

64 Bit os

when i check the processors on the device manager i can find16 processors.

The arrays a,ja,ia are allocatable arrays intialized in the begining of the code.

Presently i am solving about 600000 equation with 150,00000 nonzeros.

Still i need to increase the number of equations.

it is taking 5 minutes for each iteration which i have to do it for many times

While the program is running i can see only 50 % of cpu usage

with only 8 slots running in the task manager.

How to make it 100%.

If i dont set the stack size i am getting the stack overflow error.


***********************subroutine used *******************
******************************pardiso subroutine*******************************

subroutine mklpardiso(a,ja,ia,b,nc,n)
IMPLICIT NONE
include 'mkl_pardiso.f77'
INTEGER*8 pt(64)
C.. All other variables
INTEGER maxfct, mnum,nc,mtype, phase, n, nrhs, error, msglvl
INTEGER iparm(64)
INTEGER ia(n+1)
INTEGER ja(nc)
REAL*8 a(nc)
REAL*8 b(n)
REAL*8 x(n)
INTEGER i, idum
REAL*8 waltime1, waltime2, ddum
C.. Fill all arrays containing matrix data.
DATA nrhs /1/, maxfct /1/, mnum /1/

do i = 1, 64
iparm(i) = 0
end do
iparm(1) = 1 ! no solver default
iparm(2) = 3 ! fill-in reordering from METIS openmp=3
iparm(3) = 16 ! numbers of processors
iparm(4) = 0 ! no iterative-direct algorithm
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! =0 solution on the first n compoments of x
iparm(7) = 0 ! not in use
iparm(8) = 9 ! numbers of iterative refinement steps
iparm(9) = 0 ! not in use
iparm(10) = 13 ! perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 0 ! maximum weighted matching algorithm is
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! not in use
iparm(16) = 0 ! not in use
iparm(17) = 0 ! not in use
iparm(18) = -1 ! Output: number of nonzeros in the factor LU
iparm(19) = -1 ! Output: Mflops for LU factorization
iparm(20) = 0 ! Output: Numbers of CG Iterations
iparm(60) =0
error = 0 ! initialize error flag
msglvl = 1 ! print statistical information
mtype = 2 ! symmetric, indefinite

phase = 11 ! only reordering and symbolic factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)
.
phase = 22 ! only factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)

iparm(8) = 2 ! max numbers of iterative refinement steps
phase = 33 ! only factorization


CALL pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja,
1 idum, nrhs, iparm, msglvl, b, x, error)


b=x

phase = -1 ! release internal memory

CALL pardiso (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error)

return


END




0 Kudos
Konstantin_A_Intel
757 Views

Ahmed,

Your system has 16 logical processors, but only 8 physical cores due to Hyper-Threading. So, MKL decides that it's more optimal to run the code with 8 threads, not 16. I think your program already works in optimal conditions.

However, if you would like to set exactly 16 threads to compareperformance, please set envinronment variable MKL_NUM_THREADS=16, or call mkl_set_num_threads(16) into your program.

Please, note: iparm(3) is not used in current version of MKL PARDISO for setting a number of threads.

Best regards,

Konstantin

0 Kudos
Gennady_F_Intel
Moderator
757 Views

Ahmed,

The task that you are solving is pretty big (Quote " Presently i am solving about 600000 equation with 150,00000 nonzeros.") therefore to use static allocation is not good idea.

Could you please try to allocate all working arrays by dynamically instead of static:

ALLOCATE( ja( nnonzeros ), a( nnonzeros ), b( n ), ia( n + 1 ), x( n ), r( n ))

--Gennady

0 Kudos
ahmediiit
Beginner
757 Views

Hello sir,

I haveused

call mkl_set_num_threads(16)

but still i get a statement inthe result as

parallel direct factorization with processors: > 8

the arrays a, b,ia,ja,x are already dynamically allocated in the beginning of the code.


0 Kudos
Reply