Hi Petros,

Petros_M_ · ‎09-13-2016

Hi,

Tried to use this function to obtain a correlation matrix from a symmetric one (in some sense of proximity).

I regret to say that the documentation is leaving a lot to be desired.

I don't know what the expression : "If an input parameter is NULL, the corresponding parameter in the Summary Statistics task descriptor remains unchanged." and I guess it is safe to assume that, whoever wrote it did not care to be understood anyway.

To make matters worse, when one goes to "Storage formats of a variance-covariance/correlation/cross-product matrix " they read, facing obvious and total disregard for their time:

"A symmetric variance-covariance/correlation/cross-product matrix with elements c(i,j) is packed as a one-dimensional array cp(i + (2n - j)*(j - 1)/2) for j ≤ i. The size of the array is p*(p+ 1)/2."

What is "n" ?? Trying very hard to keep it civil here !

So let us assume it is the same as p.

In short, what this dyslexic piece of "wording" says is (probably) :

VSL_SS_MATRIX_STORAGE_FULL is a row-major matrix with all the entries filled (only God knows why, but at least this works, if you want/like to write everything twice - who doesn't ! ).

VSL_SS_MATRIX_STORAGE_L_PACKED : a Fortran column-major matrix (again only deity will comprehend why we switched storage layout),

since consecutive i's occupy adjacent memory locations.

The "funny" thing is that this does not work.

Try it with boost's ublas symmetrix_matrix< double, lower, column_major> (in the appropriate namespace, i.e. boost::numeric::ublas) and you will see what you get.

I am sorry for the tone, but have wasted lots of hours to track down the root of the problem, have spent many hours with whimsical storage preferences and sloppy documentation and I am really fed up with this situation, where there is complete lack of consistency across the library (you also ship blas and lapack which are industry standards, no?), and not even an effort for documentation common language.

PS: the extraordinary thing its that from every topic in summary statistics you point to the same awful piece of "documentation" for symmetric matrices acceptable layouts !

Andrey_N_Intel · ‎09-13-2016

Hi Petros,

thanks for the feedback. We will analyze your comments and improve documentation accordingly. Will also investigate use of Boost classes together with Summary Statistics component of the library.

Andrey

Petros_M_ · ‎09-14-2016

Hi Andrey,

I do not think I made myself clear: What I meant to say is that it seems that mkl is providing the wrong answer !

I attach a c++ code snippet and the results it produces.

A couple of explanations, to make the reading easier/quicker:

the get_stg_fmt template provides with the mkl int required for editing the task and a string to identify the layout.
the test is (template-) parametrized by the symmetric matrix types involved.
the array_type, for each symmetric type, is the storage of the matrix elements.

The code is pretty straightforward to follow. The main aims to be thorough, on the combinations of allowed symmetric types.

Here is the output :

As you notice (just look at the first 3 outputs), depending on the storage type, one obtains different outputs !!!

( Symm=full, Correl=full ) :
===========================
Input/Symm storage layout: 1 0.25 0.6 0.25 1 -0.8 0.6 -0.8 1
Output/correl storage layout: 1 0.20686 0.53587 0.20686 1 -0.71519 0.53587 -0.71519 1

( Symm=packed_l, Correl=full ) :
===============================
Input/Symm storage layout: 1 0.25 0.6 1 -0.8 1
Output/correl storage layout: 1 0.12515 0.73058 0.12515 1 -0.58603 0.73058 -0.58603 1

( Symm=packed_u, Correl=full ) :
===============================
Input/Symm storage layout: 1 0.25 1 0.6 -0.8 1
Output/correl storage layout: 1 0.10374 0.70577 0.10374 1 -0.6314 0.70577 -0.6314 1

( Symm=full, Correl=packed_l ) :
===============================
Input/Symm storage layout: 1 0.25 0.6 0.25 1 -0.8 0.6 -0.8 1
Output/correl storage layout: 1 0.20686 1 0.53587 -0.71519 1

( Symm=packed_l, Correl=packed_l ) :
===================================
Input/Symm storage layout: 1 0.25 0.6 1 -0.8 1
Output/correl storage layout: 1 0.12515 1 0.73058 -0.58603 1

( Symm=packed_u, Correl=packed_l ) :
===================================
Input/Symm storage layout: 1 0.25 1 0.6 -0.8 1
Output/correl storage layout: 1 0.10374 1 0.70577 -0.6314 1

( Symm=full, Correl=packed_u ) :
===============================
Input/Symm storage layout: 1 0.25 0.6 0.25 1 -0.8 0.6 -0.8 1
Output/correl storage layout: 1 0.20686 0.53587 1 -0.71519 1

( Symm=packed_l, Correl=packed_u ) :
===================================
Input/Symm storage layout: 1 0.25 0.6 1 -0.8 1
Output/correl storage layout: 1 0.12515 0.73058 1 -0.58603 1

( Symm=packed_u, Correl=packed_u ) :
===================================
Input/Symm storage layout: 1 0.25 1 0.6 -0.8 1
Output/correl storage layout: 1 0.10374 0.70577 1 -0.6314 1

This has nothing to do with boost or ublas containers, which are here only used to facilitate the logistics, can be thought of as syntactic sugar, if you wish, but have been forced to print their contents, which one can consider as the starting point - just an unnecessarily painful way to work ;-).

If I have not done a booboo somewhere, there is something really wrong in the mkl binary.

Here is the code (sorry could not find how to attach) (run in vs2010, win7, debug, 32 bit with mkl 11.3) :

NOTE:

looking at the locations of 1's in the output packed-upper storage, it seems plausible that the results are output in row-major order, in direct conflict with the (multiply) sited documentation snippet !
looking at the locations of 1's in the packed-lower storage, it seems that they are arranged as column-major UPPER storage ! (this is getting precious)

Of course, this does not explain why the numbers themselves do not agree in the first 3 cases !

The code:

#include <string>
#include <iostream>
#include <iomanip>
// boost
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/vector_proxy.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
#include <boost/numeric/ublas/operation.hpp>
#include <boost/numeric/ublas/symmetric.hpp>
#include <boost/numeric/ublas/triangular.hpp>
#include <boost/numeric/ublas/banded.hpp>
// mkl
#include "mkl.h"

namespace ublas = boost::numeric::ublas ;
typedef ublas::vector<double>            VectorNumeric  ;
typedef ublas::matrix<double>            symm_full_stg  ;
typedef ublas::symmetric_matrix<double, ublas::lower, ublas::column_major> symm_packed_l_stg ;
typedef ublas::symmetric_matrix<double, ublas::upper, ublas::column_major> symm_packed_u_stg ;

// defining a type->mkl stg specifier mapper :
template <typename _S = symm_full_stg >
struct get_stg_fmt {
 static const MKL_INT value = VSL_SS_MATRIX_STORAGE_FULL ;
 static std::string str() { return std::string( "full" ) ; }
 };
template <>
struct get_stg_fmt< symm_packed_l_stg > {
 static const MKL_INT value = VSL_SS_MATRIX_STORAGE_L_PACKED ;
 static std::string str() { return std::string( "packed_l" ) ; }
 };
template <>
struct get_stg_fmt< symm_packed_u_stg > {
 static const MKL_INT value = VSL_SS_MATRIX_STORAGE_U_PACKED ;
 static std::string str() { return std::string( "packed_u" ) ; }
 } ;


template < typename _Correl, typename _Symm >
void
test_correl_parametrization() {

 std::string banner( "( Symm=" ) ;
 banner += get_stg_fmt< _Symm >::str() + ", Correl=" + get_stg_fmt< _Correl >::str() + " ) : " ;
 std::cout << banner << std::endl ;
 std::cout << std::string( banner.size() - 2, '=' ) << std::endl ;
 // initialize symm :
 const size_t n = 3 ;
 _Symm symm( n, n ) ;
 const double one = double( 1 ) ;
 // lower part :
 symm( 0, 0 ) = one ;
 symm( 1, 0 ) = double( .25 ) ; symm( 1, 1 ) = one ;
 symm( 2, 0 ) = double( 0.6 ) ; symm( 2, 1 ) = double( -0.8 ) ; symm( 2, 2 ) = one ;
 // upper part;
 symm( 0, 1 ) = symm( 1, 0 ) ;
 symm( 0, 2 ) = symm( 2, 0 ) ;
 symm( 1, 2 ) = symm( 2, 1 );

 std::cout << std::setw( 30 ) << std::left << "Input/Symm storage layout: " ;
 for ( typename _Symm::array_type::const_iterator it = symm.data().begin() ; it != symm.data().end() ; ++it )
  std::cout << std::setw( 10 ) << std::right << std::right << std::setprecision( 3 ) << *it ;
 std::cout << std::endl ;
 // resize correl result:
 _Correl correl( n, n ) ;
 // mkl calls :
 MKL_INT     nF = n ;
 MKL_INT     stgfmtSymm = get_stg_fmt< _Symm >::value ;
 double const * const pSymmData = &symm( 0, 0 ) ;
 MKL_INT     stgfmtCorrel = get_stg_fmt< _Correl >::value ;
 double * const   pCorrelData = &correl( 0, 0 ) ;

 void * task ;
 MKL_INT status ;
 status = vsldSSNewTask( &task, &nF, 0, NULL, NULL, NULL, NULL ) ;
 status = vsldSSEditCorParameterization( task, pSymmData, &stgfmtSymm, pCorrelData, &stgfmtCorrel ) ;
 status = vsldSSCompute( task, VSL_SS_PARAMTR_COR, VSL_SS_METHOD_SD ) ;
 status = vslSSDeleteTask( &task ) ;
 MKL_Free_Buffers() ;
 // results :
 std::cout << std::setw( 30 ) << std::left << "Output/correl storage layout: " ;
 for ( typename _Correl::array_type::const_iterator it = correl.data().begin() ; it != correl.data().end() ; ++it )
  std::cout << std::setw( 10 ) << std::right << std::setprecision( 5 ) << *it ;
 std::cout << std::endl ;
 std::cout << std::endl ;
 }

int main() {

 test_correl_parametrization<symm_full_stg, symm_full_stg  >() ;
 test_correl_parametrization<symm_full_stg, symm_packed_l_stg >() ;
 test_correl_parametrization<symm_full_stg, symm_packed_u_stg >() ;
 test_correl_parametrization<symm_packed_l_stg, symm_full_stg  >() ;
 test_correl_parametrization<symm_packed_l_stg, symm_packed_l_stg >() ;
 test_correl_parametrization<symm_packed_l_stg, symm_packed_u_stg >() ;
 test_correl_parametrization<symm_packed_u_stg, symm_full_stg  >() ;
 test_correl_parametrization<symm_packed_u_stg, symm_packed_l_stg >() ;
 test_correl_parametrization<symm_packed_u_stg, symm_packed_u_stg >() ;

 return 1 ;
 }

Andrey_N_Intel · ‎09-14-2016

Thanks, Petros, for the test case. It will help run the analysis. Andrey

Petros_M_ · ‎09-15-2016

I tried the same test with :

typedef ublas::symmetric_matrix<double, ublas::lower, ublas::row_major>	symm_packed_l_stg	;
typedef ublas::symmetric_matrix<double, ublas::upper, ublas::row_major>	symm_packed_u_stg	;

i.e. replacing column major by row major layout.

The numbers now agree among themselves :

( Symm=full, Correl=full ) :
===========================
Input/Symm storage layout: 1.000 0.250 0.600 0.250 1.000 -0.800 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 0.207 1.000 -0.715 0.536 -0.715 1.000

( Symm=packed_l, Correl=full ) :
===============================
Input/Symm storage layout: 1.000 0.250 1.000 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 0.207 1.000 -0.715 0.536 -0.715 1.000

( Symm=packed_u, Correl=full ) :
===============================
Input/Symm storage layout: 1.000 0.250 0.600 1.000 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 0.207 1.000 -0.715 0.536 -0.715 1.000

( Symm=full, Correl=packed_l ) :
===============================
Input/Symm storage layout: 1.000 0.250 0.600 0.250 1.000 -0.800 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 1.000 0.536 -0.715 1.000

( Symm=packed_l, Correl=packed_l ) :
===================================
Input/Symm storage layout: 1.000 0.250 1.000 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 1.000 0.536 -0.715 1.000

( Symm=packed_u, Correl=packed_l ) :
===================================
Input/Symm storage layout: 1.000 0.250 0.600 1.000 -0.800 1.000
Output/correl storage layout: 1.000 0.207 1.000 0.536 -0.715 1.000

( Symm=full, Correl=packed_u ) :
===============================
Input/Symm storage layout: 1.000 0.250 0.600 0.250 1.000 -0.800 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 1.000 -0.715 1.000

( Symm=packed_l, Correl=packed_u ) :
===================================
Input/Symm storage layout: 1.000 0.250 1.000 0.600 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 1.000 -0.715 1.000

( Symm=packed_u, Correl=packed_u ) :
===================================
Input/Symm storage layout: 1.000 0.250 0.600 1.000 -0.800 1.000
Output/correl storage layout: 1.000 0.207 0.536 1.000 -0.715 1.000

So, it seems that the documentation is w-r-o-n-g !

Description

The vslSSEditCovCor routine replaces pointers to the array of means, covariance/correlation arrays, and their storage format with values passed as corresponding parameters of the routine. See Table "Storage formats of a variance-covariance/correlation/cross-product matrix" for possible values of the cov_storage and cor_storage parameters. If an input parameter is NULL, the old value of the parameter remains unchanged in the Summary Statistics task descriptor.

Storage formats of a variance-covariance/correlation/cross-product matrix
Parameter	Description
VSL_SS_MATRIX_STORAGE_FULL	A symmetric variance-covariance/correlation/cross-product matrix is a one-dimensional array with elements `c(i,j)` stored as `cp(ip + j)`. The size of the array is `p``p`.
VSL_SS_MATRIX_STORAGE_L_PACKED	A symmetric variance-covariance/correlation/cross-product matrix with elements `c(i,j)` is packed as a one-dimensional array ~~`cp(i + (2n - j)(j - 1)/2)` for `j` ≤ `i`. The size of the array is `p`(`p`+ 1)/2.~~
VSL_SS_MATRIX_STORAGE_U_PACKED	A symmetric variance-covariance/correlation/cross-product matrix with elements `c(i,j)` is packed as a one-dimensional array ~~`cp(i + j(j -`~~ ~~`1)/2)` for `i` ≤ `j`. The size of the array is `p`(`p`+ 1)/2.~~

Now the question is, is it the wrong only for this editor or for all ?

Can someone pls respond? This is important and I cannot test the entire vsl on my own (nor should I be expected to)

Andrey_N_Intel · ‎09-16-2016

Hi Petros,

Yes, the documentation for Summary Stats component of the library provides incorrect description of the packed formats for variance-covariance, correlation, cross-product, partial covariance, and correlation parametrization, and should be fixed on our side. To form upper or lower format for symmetric matrix in row-major layout the library packs its upper or lower triangle by rows into 1d array, respectively.

Pls let me know, if it addresses your question on the documentation topic.

Thanks,

Andrey

Petros_M_ · ‎09-16-2016

Hi Andrey,

Thank you for the response.

Just to be clear, this extends to all editors in summary stats. Correct ?

Thx again, P-

PS: A couple of suggestions:

for the full case, the library should offer to provide-with/require only the lower or the upper part to be filled (same happens in LAPACK)
maybe you want to consider a leading dimension input (again as in LAPACK) so people avoid copying from and to existing structures

Andrey_N_Intel · ‎09-17-2016

Hi Petros,

Yes, this extends to all editors that support data layout parameter.

Thanks for the suggestions. Will have a look.

Andrey

vsldSSEditCorParameterization problem

Description