Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

intel compiler in tesla machine

ahmediiit
Beginner
1,332 Views
Hello sir.
I wanr to ask whether the intel fortran compiler for linux
can be installed on the nvidia tesla machine
0 Kudos
10 Replies
Gennady_F_Intel
Moderator
1,332 Views
Hello,
Please look here a t the Intel Fortran Compiler Release Notes to find out the appropriate System Requirements.
0 Kudos
TimP
Honored Contributor III
1,332 Views
As you must be aware, Fortran compilers for Tesla run on a host machine, and support off-loading of cuda library code to run on Tesla under syntax resembling OpenMP. There is no compiler which installs on Tesla, nor has any decision been made about an Intel compiler supporting Tesla. Intel Fortran could be installed on a host machine for Tesla, but would not utilize Tesla unless you made yourself an interface to cuda host tools.
0 Kudos
ahmediiit
Beginner
1,332 Views
Hello sir,

this means that if i install intel fortran compiler on the host machine it will not utilise
the multi cores of tesla.Is there any tool to make it compatible with cuda to use
mkl pardiso on the tesla?

presently i am using the IVF Compiler with mkl for solving linear equation(pardiso).
My system is intel xeon processor (e5520) with 8 cores.
I need to solve large sparsematrice around 50,00000 size matrice for many iteration.

the system is taking lot of time.
please give some suggestion how to increase the speed. or changing the processor.
any processor where pardiso isefficient?
Is there any other solver faster than pardiso?
or can we attach one more processor to the present system?
does pardiso works on the cluster?
0 Kudos
Gennady_F_Intel
Moderator
1,332 Views
Hello Ahmed,
quote:""I need to solve large sparsematrice around 50,00000 size matrice for many iteration."
Do you mean the input matrices size is 5 000 000?
What mode ( in-core, out-of-core, hybryd) of PARDISO are you using?
What MKL version?
--Gennady
0 Kudos
ahmediiit
Beginner
1,332 Views
hello sir

50,00000 is thesizeof matrix not nonzero elements.
nonzero is around 1500,00000.
mkl version 10.2.3.029.
going for incore
RAM is around 24GB

0 Kudos
Gennady_F_Intel
Moderator
1,332 Views
Yes, this is very good size :).
Are you sure you don't swapping the calculation? because of the input task size will requires ~2 Gb of RAM at least ( nnz * sizeof(double) + ja * sizeof(int) ~ 2 GB). Then, at the factorization stage may requires more then 10 times memory versus the original and therefore in this case you will have swap.
Can you check it withiparm(18) -the solver will report the numbers of non-zero elements on the factors.
--Gennady
0 Kudos
ahmediiit
Beginner
1,332 Views
Hello sir,
i am sending the output for size 12lac size ( 1200000) MATRIx
for 50 lac there is problem with my code,we will see it later,
now please tell me how to make this still faster.

in the start of the program the size of the pf usage is around 3.5 GB
and when it enters the pardiso subroutine it increases to 22.5 GB

nonzero= 15346680
solution start

== PARDISO is running in In-Core mode, because iparam(60)=0 ===


=============== PARDISO: solving a symmetric indef. system ===========


ummary PARDISO: ( reorder to reorder )
===============

imes:
=====
Time fulladj: 0.098160 s
Time reorder: 7.023050 s
Time symbfct: 8.933176 s
Time parlist: 0.275664 s
Time malloc : 0.825073 s
Time total : 19.331269 s total - sum: 2.176147 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %


=============== PARDISO: solving a symmetric indef. system ===========


ummary PARDISO: ( factorize to factorize )
===============

imes:
=====
Time A to LU: 0.000000 s
Time numfct : 776.535926 s
Time malloc : 0.048327 s
Time total : 776.586925 s total - sum: 0.002672 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468


=============== PARDISO: solving a symmetric indef. system ===========


ummary PARDISO: ( solve to solve )
===============

imes:
=====
Time solve : 9.663265 s
Time total : 29.857200 s total - sum: 20.193935 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468

solution end

0 Kudos
Gennady_F_Intel
Moderator
1,332 Views
Could you please check the scalability of the solution by linking your application with the serial libraries.
--Gennady
0 Kudos
ahmediiit
Beginner
1,332 Views
Hello sir
how to link with serial libraries
0 Kudos
Gennady_F_Intel
Moderator
1,332 Views
please use the Linker Adviser to have the appropriate linking line.
0 Kudos
Reply