Community
cancel
Showing results for 
Search instead for 
Did you mean: 
slavaua
Beginner
121 Views

Pardiso does not want to utilize dual core

Jump to solution
Hello dear community members,

I am trying to getpardiso_unsym_c.c running on my dual core machine under Ubuntu 9.10 and use both cores.
I have tried export MKL_NUM_THREADS=2
Changediparm[2] = 2;
And then running make which is doing:
icc -xK -w -I/opt/intel/Compiler/11.1/069/mkl/include source/pardiso_unsym_c.c -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_solver.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel.a -Wl,--start-group "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel_thread.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_core.a -Wl,--end-group -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" -liomp5 -lpthread -lm -o _results/intel_parallel_32_lib/pardiso_unsym_c.out
export LD_LIBRARY_PATH="/opt/intel/Compiler/11.1/069/mkl/lib/32":/opt/intel/Compiler/11.1/069/lib/ia32:/opt/intel/Compiler/11.1/069/ipp/ia32/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/32:/opt/intel/Compiler/11.1/069/tbb/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.1/069/mkl/lib/32; _results/intel_parallel_32_lib/pardiso_unsym_c.out >_results/intel_parallel_32_lib/pardiso_unsym_c.res
icc -xK -w -I/opt/intel/Compiler/11.1/069/mkl/include source/pardiso_unsym_c.c -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_solver.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel.a -Wl,--start-group "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_intel_thread.a "/opt/intel/Compiler/11.1/069/mkl/lib/32"/libmkl_core.a -Wl,--end-group -L"/opt/intel/Compiler/11.1/069/mkl/lib/32" -liomp5 -lpthread -lm -o _results/intel_parallel_32_lib/pardiso_unsym_c.outexport LD_LIBRARY_PATH="/opt/intel/Compiler/11.1/069/mkl/lib/32":/opt/intel/Compiler/11.1/069/lib/ia32:/opt/intel/Compiler/11.1/069/ipp/ia32/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/32:/opt/intel/Compiler/11.1/069/tbb/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.1/069/mkl/lib/32; _results/intel_parallel_32_lib/pardiso_unsym_c.out >_results/intel_parallel_32_lib/pardiso_unsym_c.res
But getting:
Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
Anybody know why that may be happening and what can I do to fix it?
Here iscat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel Core2 Duo CPU T5550 @ 1.83GHz
stepping : 13
cpu MHz : 1000.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm
bogomips : 3657.02
clflush size : 64
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel Core2 Duo CPU T5550 @ 1.83GHz
stepping : 13
cpu MHz : 1000.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm
bogomips : 3657.51
clflush size : 64
power management:
Thanks in advance for any help.
0 Kudos
1 Solution
Gennady_F_Intel
Moderator
121 Views
Slava,
you wrote:
Time malloc : 202.740895 s
Time total : 206.280643 s total - sum: 1.073926 s
Time malloc : 202.740895 s
Time total : 206.280643 s total - sum: 1.073926 s
Could you please switch off the matching mechanism ( set iparm[12] == 0) and see the results.
--Gennady

View solution in original post

15 Replies
Gennady_F_Intel
Moderator
121 Views
Hello Slava,
Do you mean pardiso_unsym_c.c from solver's examples?
\examples\solver\source\
--Gennady
slavaua
Beginner
121 Views
Hello Slava,
Do you mean pardiso_unsym_c.c from solver's examples?
\examples\solver\source\
--Gennady

Hi Gennady,

yes, I run it from there.
Gennady_F_Intel
Moderator
121 Views
Slava,
the input data of this example ( 5x5, nnz==13) is extremely small to see the multithreaded advantages of PARDSIO.For such small inputs, PARDISO will always run in serial mode.
--Gennady
slavaua
Beginner
121 Views
Thanks for your reply.

I have modified code of example to try at least 50x50 or 400x400 but now I get segmentation fault.
This is how I fill in input matrix:

MKL_INT n = 50; /* 5 */

MKL_INT i,j,z;

double b, x, a[n*n], o;

z=0;

for(i=0; i

for(j=0; j

o = (i+j)/0.89;

if(i==j-1) o = 10;

if(i==j+1) o = 2;

a = o;

z++;

}

}

for(i=0; i = (i*21 + 101) / 1.3;

MKL_INT ia[n+1];

for(i=0;i=1+(n*i);

ia=n*n+1;

MKL_INT ja[n*n];

for(z=0;z<(n*n);z++) for(i=0;i=j+1;

Maybe because it does not have zeroes?
Gennady_F_Intel
Moderator
121 Views
Please check the CSR format first of all - to check sparse matrix representation.

iparm(27) - matrix checker. Please refer to the MKL manual for details.

--Gennady

slavaua
Beginner
121 Views
Thanks Gennady, it was an issue with input data.

Now I am running performance tests and trying to compare difference with one core and two cores involved.
I must admit I am a little puzzled. I am generating a 4000 x 4000 system and this is how long it takes to solve it:
Times:
======
Time fulladj: 0.650926 s
Time reorder: 1.434606 s
Time symbfct: 0.380290 s
Time malloc : 202.740895 s
Time total : 206.280643 s total - sum: 1.073926 s
As you can see,Time malloc takes almost all of computing time. I wonder why?
And the only difference between 1 and 2 cores is in this bit:
Summary PARDISO: ( factorize to factorize )
================
Times:
======
Time A to LU: 0.000000 s
Time numfct : 9.179849 s
Time malloc : 0.000039 s
Time total : 9.179889 s total - sum: 0.000001 s
gflop/s for the numerical factorization: 4.647834
And with 2 cores involved it is
Summary PARDISO: ( factorize to factorize )
================
Times:
======
Time A to LU: 0.000000 s
Time numfct : 5.419863 s
Time malloc : 0.000037 s
Time total : 5.419902 s total - sum: 0.000001 s
gflop/s for the numerical factorization: 7.872242
But total times is something irrelevant:
Times:
======
Time fulladj: 0.649804 s
Time reorder: 1.430529 s
Time symbfct: 0.333696 s
Time parlist: 0.000537 s
Time malloc : 202.318808 s
Time total : 205.811701 s total - sum: 1.078326 s
Anybody knows why?
Gennady_F_Intel
Moderator
121 Views
Hi Slava,
I am a little puzzled as well with such results (ime malloc : 202.740895 s ) -:).
Your input is pretty small for sparse solvers. As an example in this threadTime malloc : 0.825073 s for allocation ~1.5*10^9 nnz.
I have noguesses yetwhy it happens. need to do some experiments. Can you give moredetailsabout your system's?
CPU type, RAM, 32 or 64 bit...
--Gennady
slavaua
Beginner
121 Views
Hi Gennady,

thanks for trying to help :)
CPU is Intel Core2 Duo CPU T5550 @ 1.83GHz - you can see this info earlier in the thread.
System installed is ubuntu 9.10, 32 bit version, compiling with lib32.
This machine has 3 GBs of RAM.
slavaua
Beginner
121 Views
I think it will help if I post source code here, this is just changed file from examples folder.
Just change n variable to set matrix dimension.

Gennady_F_Intel
Moderator
121 Views
I don'tunderstandthis statement:
for(z=0;z<(n*n);z++) for(i=0;i=j+1;
slavaua
Beginner
121 Views
Sorry Gennady, I have probably attached wrong file, this string should look like:

for(z=0;z

I have re-uploaded the file.
Gennady_F_Intel
Moderator
122 Views
Slava,
you wrote:
Time malloc : 202.740895 s
Time total : 206.280643 s total - sum: 1.073926 s
Time malloc : 202.740895 s
Time total : 206.280643 s total - sum: 1.073926 s
Could you please switch off the matching mechanism ( set iparm[12] == 0) and see the results.
--Gennady

View solution in original post

slavaua
Beginner
121 Views
Gennady, this has made a hugedifference. Thank you!

Now 5000 x 5000 runs a lot faster.
I have a question however, those are times:
Summary PARDISO: ( reorder to reorder )
================
Times:
======
Time fulladj: 0.999903 s
Time reorder: 2.324973 s
Time symbfct: 0.587419 s
Time parlist: 0.000738 s
Time malloc : 0.553460 s
Time total : 5.853783 s total - sum: 1.387291 s
Summary PARDISO: ( factorize to factorize )
================
Times:
======
Time A to LU: 0.000000 s
Time numfct : 9.328830 s
Time malloc : 0.000039 s
Time total : 9.328870 s total - sum: 0.000001 s
Summary PARDISO: ( solve to solve )
================
Times:
======
Time solve : 0.076216 s
Time total : 0.367927 s total - sum: 0.291711 s
So total execution time is sum of total times above?
Gennady_F_Intel
Moderator
121 Views
yes, it should be the total execution time.
slavaua
Beginner
121 Views
Thank you Gennady, once again, your support is priceless.
I am back with another problem however :)http://software.intel.com/en-us/forums/showthread.php?t=73238&p=2#117309
Reply