Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- What performance I should expect from following code

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Pouya_Z_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-04-2013
03:35 PM

83 Views

What performance I should expect from following code

Consider following two part of the codes:

/* Perform LU factorization and store in DSS_handle */

for(k = 0; k < N; k++){

gettimeofday(&stTime, NULL);

//DSS solver options

MKL_INT solOpt = (MKL_DSS_DEFAULTS | MKL_DSS_REFINEMENT_OFF) | MKL_DSS_TRANSPOSE_SOLVE;

MKL_INT nRhs = 3;

dss_solve_real(DSS_handle, solOpt, bufferRHS, nRhs, bufferX3);

dssSolCnt++;

gettimeofday(&endTime, NULL);

dssSolTime += (double)(endTime.tv_sec*1000000 + endTime.tv_usec - stTime.tv_sec*1000000 - stTime.tv_usec);

/* Do some other things */

}

For this code, dssSolTime, which represents the time required to performe forward and backward solutions, is 19.87sec for a 3408 * 3408 matrix.

Now, if I do the same calculations sequentially using following code,

/* Perform LU factorization and store in DSS_handle */

for(k = 0; k < N; k++){

gettimeofday(&stTime, NULL);

//DSS solver options

MKL_INT solOpt = (MKL_DSS_DEFAULTS | MKL_DSS_REFINEMENT_OFF) | MKL_DSS_TRANSPOSE_SOLVE;

MKL_INT nRhs = 1;

dss_solve_real(DSS_handle, solOpt, bufferRHS, nRhs, bufferX3);

dss_solve_real(DSS_handle, solOpt, bufferRHS+numOfEqs, nRhs, bufferX3+numOfEqs);

dss_solve_real(DSS_handle, solOpt, bufferRHS+2*numOfEqs, nRhs, bufferX3+2*numOfEqs);

dssSolCnt++;

gettimeofday(&endTime, NULL);

dssSolTime += (double)(endTime.tv_sec*1000000 + endTime.tv_usec - stTime.tv_sec*1000000 - stTime.tv_usec);

/* Do some other things */

}

it completes the computations much faster anf dssSolTime will be 2.04sec for the matrix (almost 10 times faster when I ask dss_solve_real to solve for all righ-hand-side vectors.)

I assumed that dss_solve_real is smart enough to create three threads to solve for all right-hand side vectors simultaneously. Therefore, I expected first code to be three times faster than second code. But, the huge performance degradation implies that I may be missing something here. So, it is appreciated if you let me know whether or not dss_solve_real can solve for three right-hand-side vectors in parallel. Also, kindly let me know what I should logically expect from these codes and which one should be faster.

Thanks

Link Copied

1 Reply

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-06-2013
06:14 AM

83 Views

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.