- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be most appreciated if someone could provide detailed instructions for a novice on using (or 'linking') the MKL to compile to create an optimised version of the BLAS for the open source R statistical project, preferably using Visual Studio or the default gcc (for Windows). Most of the users of R would have no idea where to start with this.
The aim is to replace the default rblas.dll with the optimised one compiled using MKL.
Although this may seem like a trivial question to the MKL researchers, I believe that this would be very useful to the huge and growing community of R users, and hence the request for detailed guidance.
The R project (source code and binary) is downloadable from http://cran.r-project.org/
Thanks
Stochos
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(i) R is covered by GPL2. MKL is proprietary, and exists only for systems built with x86, x86-64 and ia64 processors..
(ii) The creation of a shared library (DLL) for use with a software system that is used by thousands of users is not a task to be entrusted to novices.
(iii) R is an interpretive language oriented towards data manipulation, statistics and visualization. Therefore, it is doubtful that the use of non-optimized BLAS causes noticeable slowdown to the majority of users, or that using MKL in place of the non-optimized BLAS would provide significant improvement in performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response.
I have rebuttal to your 3 objections;
(i) I have not suggested violating any laws. This request is posted for those
users who are working in the Windows universe and have access to the MKL. 
(ii) Given the above, the compiled dll was not meant for users to distribute it
on the R website-only for the private use. 
And why should novices not try anyway?
(iii) The performance improvement (in the Windows environment) has been
significant, even whilst using one core. This has been done by a company called
Revolutions sells services around the R. The comparison is between the
standard distribution of R(available from http://cran.r-project.org/) and the compliled Intel MKL optimised version. 
| Calculation | Size | Command | R 2.9.2 | Revolution
 R (1 core) | Revolution R (4 cores) | |
| Matrix Multiply A'*A | 10000x5000 | B <- crossprod(A) | 243 sec | 22 sec | 5.9 sec | |
| Cholesky Factorization | 5000x5000 | C <- chol(B) | 23 sec | 3.8 sec | 1.1 sec | |
| Singular Value Decomposition | 5000x5000 | S <- svd (A,nu=0,nv=0) | 62 sec | 13 sec | 4.9 sec | |
| Principal Components Analysis | 10000x5000 | P <- prcomp(A) | 237 sec | 41 sec | 15.6 sec | 
Source: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
Also, there are non-Intel optimised rblas.dll's available from http://cran.r-project.org/bin/windows/contrib/ATLAS/
Installing a version of rblas.dll for your processor will result in significant speed improvement. It appears that it would be faster with Intel MKL.
So if there is anyone, perhaps from Intel MKL team that could provide the guidance required, please do!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
------------------------------------------------------------------------------------------------------------------------
Here is an example to test:
require(Matrix)
set.seed(123)
X <- Matrix(rnorm(1e6), 1000)
print(system.time(for(i in 1:25) X%*%X))
print(system.time(for(i in 1:25) solve(X)))
print(system.time(for(i in 1:10) svd(X)))
Here is a test result on my machine (Intel Pentium M process 1.73GHz with 1 GB RAM).
Default Rblas.dll
> print(system.time(for(i in 1:25) X%*%X))New Rblas.dll
user system elapsed
114.19 0.38 121.04
> print(system.time(for(i in 1:25) solve(X)))
user system elapsed
87.03 0.28 89.31
> print(system.time(for(i in 1:10) svd(X)))
user system elapsed
232.29 1.44 242.64
> print(system.time(for(i in 1:25) X%*%X))--------------------------------------------------------------------------------------------------------------------
user system elapsed
37.18 0.36 37.89
> print(system.time(for(i in 1:25) solve(X)))
user system elapsed
30.62 0.56 31.78
> print(system.time(for(i in 1:10) svd(X)))
user system elapsed
102.89 2.17 107.17
Source: http://www.stat.columbia.edu/~cook/movabletype/archives/2008/06/a_trick_to_spee.html
Hope this helps make the case. The question is then to what extent is Intel MKL better than open source alternatives? Also, we are told that the MKL has been optimised for multithreading, so there might be speed improvement from this as well.
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page