- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir.
I wanr to ask whether the intel fortran compiler for linux
can be installed on the nvidia tesla machine
I wanr to ask whether the intel fortran compiler for linux
can be installed on the nvidia tesla machine
Link Copied
10 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Please look here a t the Intel Fortran Compiler Release Notes to find out the appropriate System Requirements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you must be aware, Fortran compilers for Tesla run on a host machine, and support off-loading of cuda library code to run on Tesla under syntax resembling OpenMP. There is no compiler which installs on Tesla, nor has any decision been made about an Intel compiler supporting Tesla. Intel Fortran could be installed on a host machine for Tesla, but would not utilize Tesla unless you made yourself an interface to cuda host tools.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir,
this means that if i install intel fortran compiler on the host machine it will not utilise
the multi cores of tesla.Is there any tool to make it compatible with cuda to use
mkl pardiso on the tesla?
presently i am using the IVF Compiler with mkl for solving linear equation(pardiso).
My system is intel xeon processor (e5520) with 8 cores.
I need to solve large sparsematrice around 50,00000 size matrice for many iteration.
the system is taking lot of time.
please give some suggestion how to increase the speed. or changing the processor.
any processor where pardiso isefficient?
Is there any other solver faster than pardiso?
or can we attach one more processor to the present system?
does pardiso works on the cluster?
this means that if i install intel fortran compiler on the host machine it will not utilise
the multi cores of tesla.Is there any tool to make it compatible with cuda to use
mkl pardiso on the tesla?
presently i am using the IVF Compiler with mkl for solving linear equation(pardiso).
My system is intel xeon processor (e5520) with 8 cores.
I need to solve large sparsematrice around 50,00000 size matrice for many iteration.
the system is taking lot of time.
please give some suggestion how to increase the speed. or changing the processor.
any processor where pardiso isefficient?
Is there any other solver faster than pardiso?
or can we attach one more processor to the present system?
does pardiso works on the cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ahmed,
quote:""I need to solve large sparsematrice around 50,00000 size matrice for many iteration."
Do you mean the input matrices size is 5 000 000?
What mode ( in-core, out-of-core, hybryd) of PARDISO are you using?
What MKL version?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hello sir
50,00000 is thesizeof matrix not nonzero elements.
nonzero is around 1500,00000.
mkl version 10.2.3.029.
going for incore
RAM is around 24GB
50,00000 is thesizeof matrix not nonzero elements.
nonzero is around 1500,00000.
mkl version 10.2.3.029.
going for incore
RAM is around 24GB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this is very good size :).
Are you sure you don't swapping the calculation? because of the input task size will requires ~2 Gb of RAM at least ( nnz * sizeof(double) + ja * sizeof(int) ~ 2 GB). Then, at the factorization stage may requires more then 10 times memory versus the original and therefore in this case you will have swap.
Can you check it withiparm(18) -the solver will report the numbers of non-zero elements on the factors.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir,
i am sending the output for size 12lac size ( 1200000) MATRIx
for 50 lac there is problem with my code,we will see it later,
now please tell me how to make this still faster.
in the start of the program the size of the pf usage is around 3.5 GB
and when it enters the pardiso subroutine it increases to 22.5 GB
nonzero= 15346680
solution start
== PARDISO is running in In-Core mode, because iparam(60)=0 ===
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( reorder to reorder )
===============
imes:
=====
Time fulladj: 0.098160 s
Time reorder: 7.023050 s
Time symbfct: 8.933176 s
Time parlist: 0.275664 s
Time malloc : 0.825073 s
Time total : 19.331269 s total - sum: 2.176147 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( factorize to factorize )
===============
imes:
=====
Time A to LU: 0.000000 s
Time numfct : 776.535926 s
Time malloc : 0.048327 s
Time total : 776.586925 s total - sum: 0.002672 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471
gflop/s for the numerical factorization: 29.430468
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( solve to solve )
===============
imes:
=====
Time solve : 9.663265 s
Time total : 29.857200 s total - sum: 20.193935 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471
gflop/s for the numerical factorization: 29.430468
solution end
i am sending the output for size 12lac size ( 1200000) MATRIx
for 50 lac there is problem with my code,we will see it later,
now please tell me how to make this still faster.
in the start of the program the size of the pf usage is around 3.5 GB
and when it enters the pardiso subroutine it increases to 22.5 GB
nonzero= 15346680
solution start
== PARDISO is running in In-Core mode, because iparam(60)=0 ===
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( reorder to reorder )
===============
imes:
=====
Time fulladj: 0.098160 s
Time reorder: 7.023050 s
Time symbfct: 8.933176 s
Time parlist: 0.275664 s
Time malloc : 0.825073 s
Time total : 19.331269 s total - sum: 2.176147 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( factorize to factorize )
===============
imes:
=====
Time A to LU: 0.000000 s
Time numfct : 776.535926 s
Time malloc : 0.048327 s
Time total : 776.586925 s total - sum: 0.002672 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471
gflop/s for the numerical factorization: 29.430468
=============== PARDISO: solving a symmetric indef. system ===========
ummary PARDISO: ( solve to solve )
===============
imes:
=====
Time solve : 9.663265 s
Time total : 29.857200 s total - sum: 20.193935 s
tatistics:
==========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 1200000
#non-zeros in A: 15346680
non-zeros in A (): 0.001066
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 483385
size of largest supernode: 12302
number of nonzeros in L 2403511054
number of nonzeros in U 1
number of nonzeros in L+U 2403511055
gflop for the numerical factorization: 22853.815471
gflop/s for the numerical factorization: 29.430468
solution end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please check the scalability of the solution by linking your application with the serial libraries.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello sir
how to link with serial libraries
how to link with serial libraries
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page