- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
At first I want to generate random in multythreads in the following code:
#define nstreams 8
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream
}
But I found, If I generate 8 VSLStreamStatePtr , other MKL functions performance will be affected(5 times slower then normal), these affected funtions are:
dtrsm("Right", "Upper", "No transpose", "Nunit", ...);
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi jian,
Here's some question about your issue:
1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?
Thanks.
Best regards,
Fiona
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi, Fiona
Thanks for your response.
I need modify the issue: ONLY dtrsm is affected by new 8 vslNewStream.
Here is test code and result in my machine:
result:
Before new 8 VSLStreamStatePtr time: 1
After new 8 VSLStreamStatePtr time: 12
#include "mkl_vsl_functions.h"
#include "mkl_vsl_defines.h"
#include "mkl_blas.h"
#include "mkl_service.h"
int MmatrixARows=26;
int NmatrixBColumns=3;
double alpha=1;
int ldm=29;
double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};
double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};
double matrixB[87];
int sweepCount = 1e6;
time_t time1, time2, time3, time4;
time(&time1);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time2);
std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;
VSLStreamStatePtr ptr_[8];
for(int i = 0; i < 8; ++i) {
vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);
}
time(&time3);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time4);
std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;
Thanks
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi, Fiona
Can you repeat the issue? Or need more detail info about lib verstion and cpu info?
Thanks.
Fiona Z. (Intel) wrote:
Hi jian,
Here's some question about your issue:
1. How did you test the performance? If you enable MKL_VERBOSE to check, or write program to get clock time?
2. How about your problem size for trsv, gemv and trsm? and what about your seed for random data generation? Could you please provide a reproducer (just a sample case) that we can investigate?Thanks.
Best regards,
Fiona
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi Jian,
I can reproduce your problem, we are investigating, I will give your response soon.
Best regards,
Fiona
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi,Gennay
I am glad to hear that.Thanks for your help.
Gennady F. (Intel) wrote:
The root cause analysis shows the problem with internal mkl_serv_allocate() routine. The issue is escalated. We will keep you updated with the status of this issue!
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
To mitigate the problem, we may recommend set MKL_DISABLE_FAST_MM=1 to disable our memory buffering. Please refer more details into MKL's User's Guide - Managing Performance and Memory
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi, Gennady
Thanks for your help.
I tried set MKL_DISABLE_FAST_MM=1 , But it make dtrsm which before Create 8 VSLStreamStatePtr become as slow as after them.
My code is linked with google's tcmalloc, which can be found in gperftools-gperftools-2.5.
And add a unused map before the code can help repeat the issue, the map is original code is a static map.
#include <map>
#include <iostream>
#include <cstring>
#include "mkl_vsl_functions.h"
#include "mkl_vsl_defines.h"
#include "mkl_blas.h"
#include "mkl_service.h"
class aaaValue
{
public:
~aaaValue() { aaa(); }
private:
void aaa() {
if (val_.sval)
free(val_.sval);
val_.sval = 0;
}
private:
union { int ival; double dval; char* sval; } val_;
};
int main(int argc, const char* argv[])
{
std::map<int, aaaValue> Map;
int MmatrixARows=26;
int NmatrixBColumns=3;
double alpha=1;
int ldm=29;
double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};
double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};
double matrixB[87];
int sweepCount = 1e5;
time_t time1, time2, time3, time4;
time(&time1);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time2);
std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;
VSLStreamStatePtr ptr_[8];
for(int i = 0; i < 8; ++i) {
vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);
}
time(&time3);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time4);
std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;
}
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora