- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
int a = 169*64;
int b = 64*1024;
const int c = 5;
float* A = new float[169*64];
float* B = new float[64*1024];
float* C = new float[169*1024];
srand(time(NULL));
for (int i=0;i<a;i++)
{
A = rand()%1000/100.0;
if (i%c==0)
{
A = -4.204e-045;
}
}
for (int j=0;j<b;j++)
{
B = rand()%10000/1000.0;
}
while (true)
{
double t0 = cvGetTickCount();
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 169, 1024, 64, 1.0, A, 64, B, 1024, .0, C, 1024);
double t1 = cvGetTickCount()-t0;
cout<<"consume time:"<<t1/cvGetTickFrequency()/1000.0<<endl;
}
excute code above, change constant c, the consume time is different. I guess the running time will be slower when the metrix contains denorimalized value. why?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Floating-point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware. This software processing causes Intel MKL functions that consume denormals to run slower than with normalized floating-point numbers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Denormal number calculation will be slow. You may can use Intel C/C++ compiler with /Qftz option flush to zero, and the perf of MKL sgemm would be improved. Or you can modify your source code to process all denormal to a normal number, such as numeric_limits<float>::min().
Best regards,
Fiona
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page