Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Pardiso Timing Information vs. Manual Profiling

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

usjetou

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2010
08:47 AM

48 Views

Pardiso Timing Information vs. Manual Profiling

I am using Pardiso from MKL 10.2 on Linux Fedora 12 and I am trying to evaluate the performance of the solver on different sparse matrices.

To do so I simply use a "gettimeofday" function calls before and after the call to Pardiso (I am using "phase=13" so only one call to Pardiso is done to solve the system).

When I compare this wall-clock time to what is provided by the statistical output of Pardiso, I get a factor 2 difference: the gettimeofday function gives me 1.2 sec execution time, while Pardiso says it only took 0.6 sec for a 165,000 matrix with 559,000 non zero elements.

Does any one know where such a difference come from? What is Pardiso exactly measuring then?

I attached below the output from Pardiso if it can help....

Thanks for your help!

jc.

-------------------------------------------------------------------------------------------------------------------

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time fulladj: 0.017774 s

Time reorder: 0.352131 s

Time symbfct: 0.041561 s

Time parlist: 0.002887 s

Time A to LU: 0.000000 s

Time numfct : 0.038386 s

Time solve : 0.011953 s

Time malloc : 0.108853 s

Time total : 0.626144 s total - sum: 0.052599 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 4

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 165984

#non-zeros in A: 559776

non-zeros in A (%): 0.002032

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

#supernodes: 156156

size of largest supernode: 13

number of nonzeros in L 534508

number of nonzeros in U 329950

number of nonzeros in L+U 864458

gflop for the numerical factorization: 0.003095

gflop/s for the numerical factorization: 0.080629

Link Copied

4 Replies

VipinKumar_E_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2010
08:59 PM

48 Views

As you may know, clock() and gettimeofday() are not really the good ones to use to check the timing.

Details of those timing functions are explained in one of our artilces for the IPP library which is located at

http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/

Snippets from the artilce below.

clock(): The granularity of clock() function is dependent on implementation by various compiler vendors. The C standard does not say anything about the granularity of clock() - a compiler can have it check time once a second and increment the variable by CLOCKS_PER_SEC. This means it is possible that, depending on different compiler implementation, you can get zero, CLOCKS_PER_SEC, CLOCKS_PER_SEC * 2 and so on, never getting any intermediate value. Don't use clock() if you need high granularity.

gettimeofday(): It returns time in milliseconds or the wall clock time. The precision of gettimeofday is also very bad, for example, for a 3 GHz machine that means precision == 3 million of cpu clocks only. If your application does only calculations, clock() and gettimeofday() would be fairly close. Any time, if the application starts waiting for something (for e.g: DISK I/O), clock() will lag behind compared to the gettimeofday(). clock() can also go faster than gettimeofday() if you have multiple threads running in the same process.

We recommend to use the MKL timing functions. Details of those functions are in "Support Functions" section of the reference manual. Online version can be found in

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/win/mkl/refman/supp...

usjetou

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-29-2010
07:54 AM

48 Views

I therefore played around with the MKL timing functions. Using *mkl_get_cpu_clocks* I getconsistentresults with the ones from g*ettimeofday* but I still have a mismatch with the timing information given by Pardiso.

Assuming that I use the MKL timing functions correctly, do you know what is Pardiso measuring exactly?

I attached below a snapshot of the code as well as its output if it can help.

Another thing is that the *mkl_get_cpu_frequency* function is telling me that the CPU is running at 20 GHz (cf. output below)... Do you have an idea why it is behaving like this?

Thanks a lot for your help!

jc.

-----------------------------------------------------------------

------------------------Code Snapshot------------------------

------------------------------------------------------------------

unsigned MKL_INT64 t1, t2;

MKL_Get_Cpu_Clocks(&t1);

gettimeofday(&startp, 0);

PARDISO (pt, &maxfct, &mnum, &mtype, &phase,&n, a, ia, ja, &idum, &nrhs,iparm, &msglvl, b, x, &error);

gettimeofday(&stopp, 0);

mkl_get_cpu_clocks(&t2);

printf("\Time by gettimeofday: \t%d usec\n", CPU_TIME2);

double cpu_freq = mkl_get_cpu_frequency();

printf("\nCPU frequency:%f GHz\n",cpu_freq);

printf("\nNumber of clocks by MKL timer: \t%d ticks\n", (t2-t1));

cpu_freq = 2666404000; //The real frequency of CPU: 2.66 GHz from /proc/cpuinfo

printf("\nTime by MKL timer: \t%f usec\n", (t2-t1)/cpu_freq);

--------------------------------------------------------------------------

---------------------------OutPut----------------------------------------

---------------------------------------------------------------------------

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time fulladj: 0.000052 s

Time reorder: 0.001714 s

Time symbfct: 0.000153 s

Time parlist: 0.000014 s

Time A to LU: 0.000000 s

Time numfct : 0.000341 s

Time solve : 0.000048 s

Time malloc : 0.000348 s

Time total : 0.002875 s total - sum: 0.000205 s

---removed statistics part------

Time by gettimeofday: 757386 usec

CPU frequency:20.000000 GHz

Number of clocks by MKL timer: 2019500012 ticks

Time by MKL timer: 0.757387 usec

Vitaly_Lukinov

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-19-2011
01:20 AM

48 Views

We have investigated the described situation and found the root cause. The reason of the issue is in mkl function dsecnd() which is used inside PARDISO for time measurement. The function dsecnd() takes approximately 0.5 sec for initialization in the first call; and this amount of time is not counted at all. So first run of PARDISO takes 0.5 sec more time than following calls.

We can propose the following workaround for the issue:

in the beginning of the program please callDSECND for dry run to complete initialization process.

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-31-2011
02:29 AM

48 Views

please check how it works with the latest 10.3 update 3 which was released yesterday and let us know if you will see the same behavior.

/gf

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.