Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7117 Discussions

Pardiso Timing Information vs. Manual Profiling

usjetou
Beginner
635 Views
Hello,
I am using Pardiso from MKL 10.2 on Linux Fedora 12 and I am trying to evaluate the performance of the solver on different sparse matrices.
To do so I simply use a "gettimeofday" function calls before and after the call to Pardiso (I am using "phase=13" so only one call to Pardiso is done to solve the system).
When I compare this wall-clock time to what is provided by the statistical output of Pardiso, I get a factor 2 difference: the gettimeofday function gives me 1.2 sec execution time, while Pardiso says it only took 0.6 sec for a 165,000 matrix with 559,000 non zero elements.
Does any one know where such a difference come from? What is Pardiso exactly measuring then?
I attached below the output from Pardiso if it can help....
Thanks for your help!
jc.
-------------------------------------------------------------------------------------------------------------------
Summary PARDISO: ( reorder to solve )
================
Times:
======
Time fulladj: 0.017774 s
Time reorder: 0.352131 s
Time symbfct: 0.041561 s
Time parlist: 0.002887 s
Time A to LU: 0.000000 s
Time numfct : 0.038386 s
Time solve : 0.011953 s
Time malloc : 0.108853 s
Time total : 0.626144 s total - sum: 0.052599 s
Statistics:
===========
< Parallel Direct Factorization with #processors: > 4
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b>
#equations: 165984
#non-zeros in A: 559776
non-zeros in A (%): 0.002032
#right-hand sides: 1
< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 156156
size of largest supernode: 13
number of nonzeros in L 534508
number of nonzeros in U 329950
number of nonzeros in L+U 864458
gflop for the numerical factorization: 0.003095
gflop/s for the numerical factorization: 0.080629
0 Kudos
4 Replies
VipinKumar_E_Intel
635 Views

As you may know, clock() and gettimeofday() are not really the good ones to use to check the timing.

Details of those timing functions are explained in one of our artilces for the IPP library which is located at

http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/

Snippets from the artilce below.

clock(): The granularity of clock() function is dependent on implementation by various compiler vendors. The C standard does not say anything about the granularity of clock() - a compiler can have it check time once a second and increment the variable by CLOCKS_PER_SEC. This means it is possible that, depending on different compiler implementation, you can get zero, CLOCKS_PER_SEC, CLOCKS_PER_SEC * 2 and so on, never getting any intermediate value. Don't use clock() if you need high granularity.

gettimeofday(): It returns time in milliseconds or the wall clock time. The precision of gettimeofday is also very bad, for example, for a 3 GHz machine that means precision == 3 million of cpu clocks only. If your application does only calculations, clock() and gettimeofday() would be fairly close. Any time, if the application starts waiting for something (for e.g: DISK I/O), clock() will lag behind compared to the gettimeofday(). clock() can also go faster than gettimeofday() if you have multiple threads running in the same process.

We recommend to use the MKL timing functions. Details of those functions are in "Support Functions" section of the reference manual. Online version can be found in

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/win/mkl/refman/support/spt_Timing.html

0 Kudos
usjetou
Beginner
635 Views
Thanks a lot for your answer!

I therefore played around with the MKL timing functions. Using mkl_get_cpu_clocks I getconsistentresults with the ones from gettimeofday but I still have a mismatch with the timing information given by Pardiso.
Assuming that I use the MKL timing functions correctly, do you know what is Pardiso measuring exactly?
I attached below a snapshot of the code as well as its output if it can help.
Another thing is that the mkl_get_cpu_frequency function is telling me that the CPU is running at 20 GHz (cf. output below)... Do you have an idea why it is behaving like this?
Thanks a lot for your help!
jc.
-----------------------------------------------------------------
------------------------Code Snapshot------------------------
------------------------------------------------------------------
unsigned MKL_INT64 t1, t2;
MKL_Get_Cpu_Clocks(&t1);
gettimeofday(&startp, 0);
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,&n, a, ia, ja, &idum, &nrhs,iparm, &msglvl, b, x, &error);
gettimeofday(&stopp, 0);
mkl_get_cpu_clocks(&t2);
printf("\Time by gettimeofday: \t%d usec\n", CPU_TIME2);
double cpu_freq = mkl_get_cpu_frequency();
printf("\nCPU frequency:%f GHz\n",cpu_freq);
printf("\nNumber of clocks by MKL timer: \t%d ticks\n", (t2-t1));
cpu_freq = 2666404000; //The real frequency of CPU: 2.66 GHz from /proc/cpuinfo
printf("\nTime by MKL timer: \t%f usec\n", (t2-t1)/cpu_freq);
--------------------------------------------------------------------------
---------------------------OutPut----------------------------------------
---------------------------------------------------------------------------
Summary PARDISO: ( reorder to solve )
================
Times:
======
Time fulladj: 0.000052 s
Time reorder: 0.001714 s
Time symbfct: 0.000153 s
Time parlist: 0.000014 s
Time A to LU: 0.000000 s
Time numfct : 0.000341 s
Time solve : 0.000048 s
Time malloc : 0.000348 s
Time total : 0.002875 s total - sum: 0.000205 s
---removed statistics part------
Time by gettimeofday: 757386 usec
CPU frequency:20.000000 GHz
Number of clocks by MKL timer: 2019500012 ticks
Time by MKL timer: 0.757387 usec
0 Kudos
Vitaly_Lukinov
Beginner
635 Views

We have investigated the described situation and found the root cause. The reason of the issue is in mkl function dsecnd() which is used inside PARDISO for time measurement. The function dsecnd() takes approximately 0.5 sec for initialization in the first call; and this amount of time is not counted at all. So first run of PARDISO takes 0.5 sec more time than following calls.

We can propose the following workaround for the issue:

in the beginning of the program please callDSECND for dry run to complete initialization process.

0 Kudos
Gennady_F_Intel
Moderator
635 Views
jc,
please check how it works with the latest 10.3 update 3 which was released yesterday and let us know if you will see the same behavior.
/gf
0 Kudos
Reply