- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At first I want to generate random in multythreads in the following code:
#define nstreams 8
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream
}
But I found, If I generate 8 VSLStreamStatePtr , other MKL functions performance will be affected(5 times slower then normal), these affected funtions are:
dtrsm("Right", "Upper", "No transpose", "Nunit", ...);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi jian,
Here's some question about your issue:
1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?
Thanks.
Best regards,
Fiona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Fiona
Thanks for your response.
I need modify the issue: ONLY dtrsm is affected by new 8 vslNewStream.
Here is test code and result in my machine:
result:
Before new 8 VSLStreamStatePtr time: 1
After new 8 VSLStreamStatePtr time: 12
#include "mkl_vsl_functions.h"
#include "mkl_vsl_defines.h"
#include "mkl_blas.h"
#include "mkl_service.h"
int MmatrixARows=26;
int NmatrixBColumns=3;
double alpha=1;
int ldm=29;
double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};
double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};
double matrixB[87];
int sweepCount = 1e6;
time_t time1, time2, time3, time4;
time(&time1);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time2);
std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;
VSLStreamStatePtr ptr_[8];
for(int i = 0; i < 8; ++i) {
vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);
}
time(&time3);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time4);
std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Fiona
Can you repeat the issue? Or need more detail info about lib verstion and cpu info?
Thanks.
Fiona Z. (Intel) wrote:
Hi jian,
Here's some question about your issue:
1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?Thanks.
Best regards,
Fiona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jian,
I can reproduce your problem, we are investigating, I will give your response soon.
Best regards,
Fiona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,Gennay
I am glad to hear that.Thanks for your help.
Gennady F. (Intel) wrote:
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To mitigate the problem, we may recommend set MKL_DISABLE_FAST_MM=1 to disable our memory buffering. Please refer more details into MKL's User's Guide - Managing Performance and Memory
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Gennady
Thanks for your help.
I tried set MKL_DISABLE_FAST_MM=1 , But it make dtrsm which before Create 8 VSLStreamStatePtr become as slow as after them.
My code is linked with google's tcmalloc, which can be found in gperftools-gperftools-2.5.
And add a unused map before the code can help repeat the issue, the map is original code is a static map.
#include <map>
#include <iostream>
#include <cstring>
#include "mkl_vsl_functions.h"
#include "mkl_vsl_defines.h"
#include "mkl_blas.h"
#include "mkl_service.h"
class aaaValue
{
public:
~aaaValue() { aaa(); }
private:
void aaa() {
if (val_.sval)
free(val_.sval);
val_.sval = 0;
}
private:
union { int ival; double dval; char* sval; } val_;
};
int main(int argc, const char* argv[])
{
std::map<int, aaaValue> Map;
int MmatrixARows=26;
int NmatrixBColumns=3;
double alpha=1;
int ldm=29;
double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};
double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};
double matrixB[87];
int sweepCount = 1e5;
time_t time1, time2, time3, time4;
time(&time1);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time2);
std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;
VSLStreamStatePtr ptr_[8];
for(int i = 0; i < 8; ++i) {
vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);
}
time(&time3);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time4);
std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;
}
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page