- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider following two part of the codes:
/* Perform LU factorization and store in DSS_handle */
for(k = 0; k < N; k++){
gettimeofday(&stTime, NULL);
//DSS solver options
MKL_INT solOpt = (MKL_DSS_DEFAULTS | MKL_DSS_REFINEMENT_OFF) | MKL_DSS_TRANSPOSE_SOLVE;
MKL_INT nRhs = 3;
dss_solve_real(DSS_handle, solOpt, bufferRHS, nRhs, bufferX3);
dssSolCnt++;
gettimeofday(&endTime, NULL);
dssSolTime += (double)(endTime.tv_sec*1000000 + endTime.tv_usec - stTime.tv_sec*1000000 - stTime.tv_usec);
/* Do some other things */
}
For this code, dssSolTime, which represents the time required to performe forward and backward solutions, is 19.87sec for a 3408 * 3408 matrix.
Now, if I do the same calculations sequentially using following code,
/* Perform LU factorization and store in DSS_handle */
for(k = 0; k < N; k++){
gettimeofday(&stTime, NULL);
//DSS solver options
MKL_INT solOpt = (MKL_DSS_DEFAULTS | MKL_DSS_REFINEMENT_OFF) | MKL_DSS_TRANSPOSE_SOLVE;
MKL_INT nRhs = 1;
dss_solve_real(DSS_handle, solOpt, bufferRHS, nRhs, bufferX3);
dss_solve_real(DSS_handle, solOpt, bufferRHS+numOfEqs, nRhs, bufferX3+numOfEqs);
dss_solve_real(DSS_handle, solOpt, bufferRHS+2*numOfEqs, nRhs, bufferX3+2*numOfEqs);
dssSolCnt++;
gettimeofday(&endTime, NULL);
dssSolTime += (double)(endTime.tv_sec*1000000 + endTime.tv_usec - stTime.tv_sec*1000000 - stTime.tv_usec);
/* Do some other things */
}
it completes the computations much faster anf dssSolTime will be 2.04sec for the matrix (almost 10 times faster when I ask dss_solve_real to solve for all righ-hand-side vectors.)
I assumed that dss_solve_real is smart enough to create three threads to solve for all right-hand side vectors simultaneously. Therefore, I expected first code to be three times faster than second code. But, the huge performance degradation implies that I may be missing something here. So, it is appreciated if you let me know whether or not dss_solve_real can solve for three right-hand-side vectors in parallel. Also, kindly let me know what I should logically expect from these codes and which one should be faster.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page