- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sstein
http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mklman/lse/functn_stein.htm
But the MKL user guide doesn't indicate sstein has threaded version:
http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mkl_userguide_lnx/mkl_userguide_lnx.pdf
page 44.
http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mklman/lse/functn_stein.htm
But the MKL user guide doesn't indicate sstein has threaded version:
http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mkl_userguide_lnx/mkl_userguide_lnx.pdf
page 44.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you look at sstein.f (http://www.netlib.org/lapack/lapack-3.1.1/html/sstein.f.html) it would appear that you could do some parallization. As to the overall effect for your application I could not venture to guess. The listed subroutine is not parallel however some of the library functions called (in MKL equivilent) may be parallel. Some of the statements in this routine can be parallized. I won't list the routine here but I will make some comments if you care to look at the listing:
On cursory look the DO 160 loop (main loop) cannot be parallized because this is an iterative convergence type of loop.
The section of code that computes ONENRM could be lifted out of the DO 160 loop and create an array of ONENRM(I) in a parallel DO I=1,IBLOCK(M). Then inside the DO 160 loop use new arreayONENRM(NBLK) instead of the scalar ONENRM
Depending on BLKSIZ and memory controllerthe four calls to SLARNV, SCOPY, SCOPY, SCOPY could be performed in parallel (SECTIONS).
The DO 130 loop should be made into two loops to avoid ZEROing Z(B1:B1+BLKSIZ-1)
As to what is inside the MKL sstein I cannot say.
Jim Dempsey
On cursory look the DO 160 loop (main loop) cannot be parallized because this is an iterative convergence type of loop.
The section of code that computes ONENRM could be lifted out of the DO 160 loop and create an array of ONENRM(I) in a parallel DO I=1,IBLOCK(M). Then inside the DO 160 loop use new arreayONENRM(NBLK) instead of the scalar ONENRM
Depending on BLKSIZ and memory controllerthe four calls to SLARNV, SCOPY, SCOPY, SCOPY could be performed in parallel (SECTIONS).
The DO 130 loop should be made into two loops to avoid ZEROing Z(B1:B1+BLKSIZ-1)
As to what is inside the MKL sstein I cannot say.
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page