Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Complexity of functions ?potrs, ?potrf and cblas_dgemm

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Fabian_K_1

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2016
04:08 AM

82 Views

Complexity of functions ?potrs, ?potrf and cblas_dgemm

Dear MKL forum,

I'm using the functions "?potrf" for Cholesky factorization of a matrix and "?potrs" for solving a linear equation system. Additionally I need the function "cblas_dgemm" (matrix multiplication) for further calculations. These functions are used in a distributed system with multiple servers, but I need the exact complexity for each of these algorithms for optimal load balancing (see: big O notation). I don't prefer to use the complexities given in common literature because the MKL functions are optimized and don't work with the common complexities.

Can you help me out?

Best regards

Link Copied

2 Replies

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-01-2016
12:04 AM

82 Views

Hi Fabian,

It is good question, I guess i understand your questions, but I'm afraid that they likely lead to some ambiguity about algorithm complexity and MKL optimization.

Actually, MKL function like cblas_dgemm etc, we don't change the algorithm complexity. The principle for optimize the MKL function is to maximumly utilize hardware resource, for example, vectorized ( fully use SIMD introduction), threaded. (all core are used).

The "?potrf and cblas_dgemm are vectorized and threaded, and be multi-core ready. So if you use these functions in a distributed system with multiple servers, unless you wrote high-level threads (like MPI process) to distribute the task, in most of case, you can call them directly and get the multi-cores used with good performance.

If you do want distribute task your self, for example, 5 1000x1000 cblas_dgemm on one sever and 2 12500x12500 cblas_dgemm on another server, if you may worry about the imblance, you can consider the relationship like algorithm complexity vs.hardware resource etc. But you can image it is not linear, even no exact formula to discrible it. So i may recommend to use system profile tools, for example, if Intel MPI program, you use the ITAC to profile and adjust the workload.

Best Regards,

Ying

Fabian_K_1

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-12-2016
01:57 AM

82 Views

Thank you very much!

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.