- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

TIA.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello. Youcomparenave implementation of complex double precision multiplication with VML HA-version implementation (that it slow, but accurate). VML provides fast nave implementation too, as EP-version.

To use VML EP-version of complex double multiplicationinstead of HA-version,you can change default VML mode to EP by calling vmlsetmode(VML_EP) before call to vzmul, or just replace

vzmul(&size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4);

by

vmzmul(&size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4, VML_EP);

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

VML doesn't do anything magic which you couldn't accomplish with OpenMP and a vectorizing compiler.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello. Youcomparenave implementation of complex double precision multiplication with VML HA-version implementation (that it slow, but accurate). VML provides fast nave implementation too, as EP-version.

To use VML EP-version of complex double multiplicationinstead of HA-version,you can change default VML mode to EP by calling vmlsetmode(VML_EP) before call to vzmul, or just replace

vzmul(&size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4);

by

vmzmul(&size,(MKL_Complex16*)buf1,(MKL_Complex16*)buf2,(MKL_Complex16*)buf4, VML_EP);

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you specify a bit detailed, what version on MKL do you use,is your OS Windows, is yourapplication 32-bit or 64-bit,what processor do you have (Core i7 2960XM or another one).

Thanks,

Eugeny.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I was able to reproduce the issue. It will be fixed in new MKL release.

Thanks for finding,

Eugeny.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Eugeny,

I also meet the problem when I multiply two complex vectors using the vmzMul function.

I wonder whether the problem is fixed in the MKL version 11.3.2?

Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Hao,

The original issue was fixed in early version about 11.0.x. so should in MKL 11.3.2 too.

Do you use the same test with VML EP-version on some machine with MKL 11.3.2. Could you please let us know the OS and processing information you are testing?

Best

Ying

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, Ying.

The OS is Linux.

I use the left code to compute two 16*8 matrices multiplication element by element, and use the right code to test whether the vmzMul function

could run faster than the left one.

Complex ** ppIn1, **ppIn2, **ppOut; MKL_Complex16 * pIn1, *pIn2, *pOut;

.... len = 16*8;

...

double seconds_s1 = dsecnd(); double seconds_s2 = dsecnd();

for(int i=0; i<16; i++) vmzMul(len, pIn1, pIn2, pOut, VML_EP);

{ double seconds_e2 = dsecnd() - seconds_s2;

for(int j=0; j<8; j++) cout << seconds_e2 << endl;

{

ppOut

}

}

double seconds_e1 = dsecnd() - seconds_s1;

cout << seconds_e1 << endl;

The result is seconds_s1 = 8.19564e-07, and seconds_s2 = 6.13928e-06. I wonder what I was doing wrong to have this kind of result.

Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Hao,

Have you tried the latest version, for example, MKL 2017?

I did a quick test. the performance shows the MK is far fast than direct one.

Intel(R) Math Kernel Library Version 2017.0.0 Beta Update 1 Build 20160513 for

ntel(R) 64 architecture applications

direct : 0.157639

mkl vmzMul : 0.00693191

Press any key to continue . . .

As the test marix size seem small. I add a few hundred dummy loop iterations around the main computation, just to make it run longer.

here is my test code. would you please try it and let us know the result?

#include "stdafx.h" // TODO: reference any additional headers you need in STDAFX.H // and not in this file #include <iostream> #include <random> #include <ctime> #include <new> #include <tuple> #include <complex> #include <mkl.h> #define LOOP 10000 typedef std::complex<double> Complex; const MKL_INT Arows = 16, Acols = 8; using namespace std; void Comon_vml(){ Complex ppIn1[Arows][Acols],ppIn2[Arows][Acols], ppOut[Arows][Acols]; for (int i = 0; i < Arows; i++){ for(int j = 0; j <Acols; j++){ ppIn1= Complex(i+1,j+1); ppIn2 = Complex(i+1,j+1); } } double seconds_s1 = dsecnd(); for (int iter=0; iter<LOOP; iter++){ for(int i=0; i<Arows; i++) { for(int j=0; j<Acols; j++) { ppOut = ppIn1 * ppIn2 ; } } } double seconds_e1 = dsecnd() - seconds_s1; cout << "direct : " << seconds_e1 << endl; /* std::cout << "From direct" << std::endl; for (int i = 0; i < Arows; i++){ for(int j = 0; j < Acols; j++){ cout << "[" << i << ", " << j << "]" <<ppOut <<"\t"; } std::cout << std::endl; } */ } void mkl_vml(){ /*MKL_Complex16 * pIn1, *pIn2, *pOut; pIn1 = new MKL_Complex16[len](); pIn2 = new MKL_Complex16[len](); pOut = new MKL_Complex16[len](); */ MKL_Complex16 ppIn1[Arows][Acols], ppIn2[Arows][Acols], ppOut[Arows][Acols]; for (int i = 0; i < Arows; i++){ for(int j = 0; j <Acols; j++){ ppIn1 .real = i+1; ppIn1 .imag = j+1; ppIn2 .real = i+1; ppIn2 .imag = j+1; } } MKL_INT len = Arows*Acols; double seconds_s2 = dsecnd(); for (int i=0; i<LOOP; i++) vmzMul(len, &ppIn1[0][0], &ppIn2[0][0], &ppOut[0][0], VML_EP); double seconds_e2 = dsecnd() - seconds_s2; cout << "mkl vmzMul : " << seconds_e2 << endl; /* std::cout << "From vmzMul" << std::endl; for (int i = 0; i < Arows; i++){ for(int j = 0; j < Acols; j++){ cout << "[" << i << ", " << j << "]" << "(" << ppOut .real << "," << ppOut .imag << ")" << "\t" ; } std::cout << std::endl; } */ } int main(void) { int len=198; char buf[198]; mkl_get_version_string(buf, len); cout << buf <<endl; Comon_vml(); mkl_vml(); return 0; }

Best Regards,

Ying

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page