Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jian_l_1
Beginner
136 Views

Create 8 VSLStreamStatePtr affected MKL "dtrsm"' s performance, include test code,issue still open

Jump to solution

At first I want to generate random in multythreads in the following code:

#define nstreams 8
VSLStreamStatePtr stream[nstreams];

int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream, VSL_BRNG_MT2203+k, seed );
}

But I found, If I generate 8 VSLStreamStatePtr , other MKL functions performance will be affected(5 times slower then normal), these affected funtions are:

dtrsm("Right", "Upper", "No transpose", "Nunit", ...);

 

 

 
0 Kudos
1 Solution
Gennady_F_Intel
Moderator
136 Views

The root cause analysis shows the problem  with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!

 

View solution in original post

8 Replies
Zhen_Z_Intel
Employee
136 Views

Hi jian,

Here's some question about your issue:

1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?

Thanks.

Best regards,
Fiona

jian_l_1
Beginner
136 Views

Hi, Fiona

Thanks for your response.

I need modify the issue: ONLY dtrsm is affected by new 8 vslNewStream.

Here is test code and result in my machine:

result:

Before new 8 VSLStreamStatePtr time: 1

After new 8 VSLStreamStatePtr time: 12

Code (c++):
 

#include "mkl_vsl_functions.h"

#include "mkl_vsl_defines.h"

#include "mkl_blas.h"

#include "mkl_service.h"

 

 

int MmatrixARows=26;

int NmatrixBColumns=3;

double alpha=1;

int ldm=29;

double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};

double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};

double matrixB[87];

 

int sweepCount = 1e6;

time_t time1, time2, time3, time4;

time(&time1);

for(int count = 0;  count < sweepCount; ++count) {

    memcpy(matrixB, matrixB_ori, sizeof(double)*87);

    dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);

}

time(&time2);

std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;

 

VSLStreamStatePtr                   ptr_[8];

for(int i = 0; i < 8; ++i) {

    vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);

}

 

time(&time3);

for(int count = 0;  count < sweepCount; ++count) {

    memcpy(matrixB, matrixB_ori, sizeof(double)*87);

    dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);

}

time(&time4);

std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;

 

Thanks

 
jian_l_1
Beginner
136 Views

Hi, Fiona

Can you repeat the issue? Or need more detail info about lib verstion and cpu info?

 

Thanks. 

 

Fiona Z. (Intel) wrote:

Hi jian,

Here's some question about your issue:

1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?

Thanks.

Best regards,
Fiona

Zhen_Z_Intel
Employee
136 Views

Hi Jian,

I can reproduce your problem, we are investigating, I will give your response soon.

Best regards,
Fiona

Gennady_F_Intel
Moderator
137 Views

The root cause analysis shows the problem  with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!

 

View solution in original post

jian_l_1
Beginner
136 Views

Hi,Gennay

I am glad to hear that.Thanks for your help. 

Gennady F. (Intel) wrote:

The root cause analysis shows the problem  with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!

 

Gennady_F_Intel
Moderator
136 Views

To mitigate the problem, we may recommend set MKL_DISABLE_FAST_MM=1 to disable our memory buffering. Please refer more details into MKL's User's Guide -  Managing Performance and Memory

 

jian_l_1
Beginner
136 Views

Hi, Gennady

Thanks for your help.

I tried  set MKL_DISABLE_FAST_MM=1 , But it make dtrsm which before Create 8 VSLStreamStatePtr become as slow as after them.

My code is linked with google's tcmalloc, which can be found in gperftools-gperftools-2.5.

And add a unused map before the code can help repeat the issue, the map is original code is a static map.

#include <map>

#include <iostream>

#include <cstring>

#include "mkl_vsl_functions.h"

#include "mkl_vsl_defines.h"

#include "mkl_blas.h"

#include "mkl_service.h"

 

class aaaValue

{

public:

 

    ~aaaValue()                                          { aaa(); }

private:

 

    void aaa() {

            if (val_.sval)

                free(val_.sval);

            val_.sval = 0;

    }

 

private:

    union { int ival; double dval; char* sval; }    val_;

};

 

 

int main(int argc, const char* argv[])

{

 

    std::map<int, aaaValue> Map;

 

    int MmatrixARows=26;

    int NmatrixBColumns=3;

    double alpha=1;

    int ldm=29;

    double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};

    double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};

    double matrixB[87];

 

    int sweepCount = 1e5;

    time_t time1, time2, time3, time4;

    time(&time1);

    for(int count = 0;  count < sweepCount; ++count) {

        memcpy(matrixB, matrixB_ori, sizeof(double)*87);

        dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);

    }

    time(&time2);

    std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;

 

    VSLStreamStatePtr                   ptr_[8];

    for(int i = 0; i < 8; ++i) {

        vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);

    }

 

    time(&time3);

    for(int count = 0;  count < sweepCount; ++count) {

        memcpy(matrixB, matrixB_ori, sizeof(double)*87);

        dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);

    }

    time(&time4);

    std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;

 

}

 

 

 

 

Reply