topic intel compiler in tesla machine in Intel® oneAPI Math Kernel Library

intel compiler in tesla machine

ahmediiit — Thu, 15 Apr 2010 07:06:42 GMT

Hello sir.
I wanr to ask whether the intel fortran compiler for linux
can be installed on the nvidia tesla machine

intel compiler in tesla machine

Gennady_F_Intel — Thu, 15 Apr 2010 13:08:58 GMT

Hello,

Please look here a t the Intel Fortran Compiler Release Notes to find out the appropriate System Requirements.

intel compiler in tesla machine

TimP — Thu, 15 Apr 2010 13:34:37 GMT

As you must be aware, Fortran compilers for Tesla run on a host machine, and support off-loading of cuda library code to run on Tesla under syntax resembling OpenMP. There is no compiler which installs on Tesla, nor has any decision been made about an Intel compiler supporting Tesla. Intel Fortran could be installed on a host machine for Tesla, but would not utilize Tesla unless you made yourself an interface to cuda host tools.

intel compiler in tesla machine

ahmediiit — Mon, 19 Apr 2010 03:16:14 GMT

Hello sir,

this means that if i install intel fortran compiler on the host machine it will not utilise
the multi cores of tesla.Is there any tool to make it compatible with cuda to use
mkl pardiso on the tesla?

presently i am using the IVF Compiler with mkl for solving linear equation(pardiso).
My system is intel xeon processor (e5520) with 8 cores.
I need to solve large sparsematrice around 50,00000 size matrice for many iteration.

the system is taking lot of time.
please give some suggestion how to increase the speed. or changing the processor.
any processor where pardiso isefficient?
Is there any other solver faster than pardiso?
or can we attach one more processor to the present system?
does pardiso works on the cluster?

intel compiler in tesla machine

Gennady_F_Intel — Mon, 19 Apr 2010 05:21:27 GMT

Hello Ahmed,

quote:""I need to solve large sparsematrice around 50,00000 size matrice for many iteration."

Do you mean the input matrices size is 5 000 000?

What mode ( in-core, out-of-core, hybryd) of PARDISO are you using?

What MKL version?

--Gennady

intel compiler in tesla machine

ahmediiit — Mon, 19 Apr 2010 06:56:08 GMT

hello sir

50,00000 is thesizeof matrix not nonzero elements.
nonzero is around 1500,00000.
mkl version 10.2.3.029.
going for incore
RAM is around 24GB

intel compiler in tesla machine

Gennady_F_Intel — Mon, 19 Apr 2010 13:02:34 GMT

Yes, this is very good size :).

Are you sure you don't swapping the calculation? because of the input task size will requires ~2 Gb of RAM at least ( nnz * sizeof(double) + ja * sizeof(int) ~ 2 GB). Then, at the factorization stage may requires more then 10 times memory versus the original and therefore in this case you will have swap.

Can you check it withiparm(18) -the solver will report the numbers of non-zero elements on the factors.

--Gennady

intel compiler in tesla machine

ahmediiit — Tue, 20 Apr 2010 03:40:32 GMT

Hello sir,
i am sending the output for size 12lac size ( 1200000) MATRIx
for 50 lac there is problem with my code,we will see it later,
now please tell me how to make this still faster.

in the start of the program the size of the pf usage is around 3.5 GB
and when it enters the pardiso subroutine it increases to 22.5 GB

nonzero= 15346680
solution start

== PARDISO is running in In-Core mode, because iparam(60)=0 ===

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( reorder to reorder )
===============

imes:
=====
Time fulladj: 0.098160 s
Time reorder: 7.023050 s
Time symbfct: 8.933176 s
Time parlist: 0.275664 s
Time malloc : 0.825073 s
Time total : 19.331269 s total - sum: 2.176147 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( factorize to factorize )
===============

imes:
=====
Time A to LU: 0.000000 s
Time numfct : 776.535926 s
Time malloc : 0.048327 s
Time total : 776.586925 s total - sum: 0.002672 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468

=============== PARDISO: solving a symmetric indef. system ===========

ummary PARDISO: ( solve to solve )
===============

imes:
=====
Time solve : 9.663265 s
Time total : 29.857200 s total - sum: 20.193935 s

tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471

gflop/s for the numerical factorization: 29.430468

solution end

intel compiler in tesla machine

Gennady_F_Intel — Tue, 20 Apr 2010 06:06:31 GMT

Could you please check the scalability of the solution by linking your application with the serial libraries.

--Gennady

intel compiler in tesla machine

ahmediiit — Tue, 20 Apr 2010 06:49:07 GMT

Hello sir
how to link with serial libraries

intel compiler in tesla machine

Gennady_F_Intel — Tue, 20 Apr 2010 07:24:46 GMT

please use the Linker Adviser to have the appropriate linking line.