- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have to refile an issue since nobody follow up on my old post
dtrsm is affected by new 8 vslNewStream.
Here is test code and result in my machine:
result:
Before new 8 VSLStreamStatePtr time: 1
After new 8 VSLStreamStatePtr time: 12
Code (c++):
#include "mkl_vsl_functions.h"
#include "mkl_vsl_defines.h"
#include "mkl_blas.h"
#include "mkl_service.h"
int MmatrixARows=26;
int NmatrixBColumns=3;
double alpha=1;
int ldm=29;
double matrixA[87]={0.00311007,-1.12899e-05,-0.000141499,-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06};
double matrixB_ori[87]={-1.82698e-14,-0.000785694,-1.98974e-14,-0.000778519,-2.71811e-14,-2.29056e-14,-2.7844e-14,-2.24393e-14,-3.12059e-14,-1.26095e-14,-4.47909e-10,-7.97785e-19,-1.74566e-07,-4.15789e-10,-2.17286e-29,-1.56053e-10,-1.34911e-12,-2.19906e-27,-3.5138e-09,-1.0398e-07,-5.29274e-06,-4.9252e-07,-8.93104e-05,-3.71938e-05,-1.28896e-09,-1.17735e-07,-3.26114e-08,0.00289051,-0.000149547,-2.34128e-13,-1.92531e-13,-0.000706043,-2.17513e-13,-3.48327e-13,-0.000670723,-3.56823e-13,-2.8756e-13,-3.99905e-13,-1.61591e-13,-5.73997e-09,-2.17241e-19,-1.42301e-08,-5.32835e-09,-5.47231e-30,-8.81321e-10,-3.39771e-13,-5.53829e-28,-8.70999e-10,-2.05336e-08,-5.30965e-06,-4.83782e-07,-8.84461e-05,-3.73614e-05,-1.5481e-11,-1.15641e-07,-4.02283e-07,-4.25622e-07,0.00303418,-0.00079269,-2.05185e-13,-2.71747e-13,-2.3181e-13,-0.000797297,-3.1283e-13,-3.80276e-13,-3.06461e-13,-4.2619e-13,-1.72212e-13,-6.11725e-09,-1.91907e-19,-1.90223e-07,-5.67858e-09,-4.75132e-30,-1.02734e-09,-2.95006e-13,-4.80861e-28,-5.38704e-10,-1.5342e-08,-1.98894e-06,-6.03159e-06,-8.35693e-05,-4.29327e-05,-1.62871e-10,-1.44174e-06,-1.87786e-13,-2.49161e-13,-0.00079269};
double matrixB[87];
int sweepCount = 1e6;
time_t time1, time2, time3, time4;
time(&time1);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time2);
std::cout<<" Before new 8 VSLStreamStatePtr time: "<<difftime(time2, time1)<<std::endl;
VSLStreamStatePtr ptr_[8];
for(int i = 0; i < 8; ++i) {
vslNewStream(&ptr_, VSL_BRNG_MT2203 + i, 1);
}
time(&time3);
for(int count = 0; count < sweepCount; ++count) {
memcpy(matrixB, matrixB_ori, sizeof(double)*87);
dtrsm("Right", "Upper", "No transpose", "Nunit", &MmatrixARows, &NmatrixBColumns, &alpha, matrixA, &ldm, matrixB, &ldm);
}
time(&time4);
std::cout<<"After new 8 VSLStreamStatePtr time: "<<difftime(time4, time3)<<std::endl;
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jian,
We can reproduce the problem, will investigate later.
Yes, if possible, please provide us your OS and CPU information. or if it is privacy, please submit your question to software.intel.com/en-us/support/online-service-center
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Ying
Thanks for your response.
My machine OS is:
CentOS release 6.8 (Final)
Kernel \r on an \m
CupInfo:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
stepping : 2
microcode : 19
cpu MHz : 2925.814
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_t
sc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lah
f_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 5851.80
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
…
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
stepping : 2
microcode : 19
cpu MHz : 2925.814
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 10
cpu cores : 6
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 5851.80
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do believe the problem with how you measure this exec time. Could you somehow remove memcpy from your measurement cycle.
otherwise, you measure not only trsm but memcpy performance too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Gennady F
Because dtrsm overwrite matrixB as result, so I use memcpy to initialize matrixB before call dtrsm. I don't think memcpy cause the performance issue.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jian,
We already escalated this problem and start to fix the internal memory allocation problem. Cause you submitted two tickets with same issue, I will close this thread and update info in the other issue. Please watch ticket with no. #748385. Thank you.
Best regards,
Fiona
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page