- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I test a random 5000 by 5000 matrix using intel MKL dgeev and matlab separately on the same machine (Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz) and record the CPU cost time for just the eigendecomposition step. When I use icc .... -mkl:parallel, it costs 541s, when I use icc ... -mkl:sequential to compile, it costs 232s. However, matlab eig just cost 70s.

Thus I have two questions:

1. why sequential is much faster than parallel?

2. according to matlab, it uses Intel(R) Math Kernel Library Version 11.1.1 to do eigen decomposition, why it is much much faster than dgeev used in my C++ codes.

Can you guys provide me any ideas? Any suggestion on how to make eigen decomposition faster if I use C++?

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

That's an unexpected behavior. For this problem size, threaded mode should be faster vs sequential one. What version of MKL do you use? Could you give the reproducer to check the problem on our side?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Wei W,

Parallel program may take less wall time at the expense of consuming more CPU time.

Also, your C++ program must be correct, because an incorrect program may be very slow just because it does more computation (or it may be very fast too). For a simple test the program can be given a diagonal matrix with all values different.

Thanks

Dima

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It looks to me like you're getting CPU time rather than "wall" time. In parallel operations, many of the timers report the sum of CPU times.

Have you tried MKL second/dsecond?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The MKL I used in my C++ codes is **MKL 11.3**, containing in Intel Parallel Studio XE 2016 Cluster Edition.

The C++ codes I wrote is correct since I have checked the results. I also retested on the **wall time** mkl sequential, mkl parallel used to compute the same 5000-by-5000 matrix, the results are as follows:

**sequential**: wall time 240.39s, cpu time 232.11s

**parallel**: wall time 248.45s, cpu time 467.78s

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page