Parallelizing FFT not seeing 100% CPU

Marshall__Michael_B · ‎01-07-2010

I am new to using the MKL.

I have the following example:

numElements = 1 << 23;

DftiCreateDescriptor(&complexDescriptor, DFTI_DOUBLE, DFTI_COMPLEX, 1, numElements);
DftiSetValue(complexDescriptor, DFTI_BACKWARD_SCALE, (double) 1 / numElements);
DftiCommitDescriptor(complexDescriptor);
DftiComputeForward(complexDescriptor, compDataArray);
DftiFreeDescriptor(&complexDescriptor);

I have run on both a core2 quad and duo machine running Win XP Pro SP 3. On neither do I see 100 % CPU usage. On the quad, only 2 cores seem to show increased activity.

I inherited this project and I believe I am working with MLK 5.1

Any help would be useful (including how to determine the exact version of the library)

Thanks

Dmitry_B_Intel · ‎01-07-2010

Hi,

If it works, you must be working withMKL later than 5.1, because that dated version of MKL didn't provide DFTI functions.Function mkl_get_version_string maybe used to determine exact version ifMKL releaseis nottoo old.

Is your OS 32-bit?

Thanks
Dima

Gennady_F_Intel · ‎01-07-2010

Quoting - mimarsh2

I am new to using the MKL.

I have the following example:

numElements = 1 << 23;

DftiCreateDescriptor(&complexDescriptor, DFTI_DOUBLE, DFTI_COMPLEX, 1, numElements);
DftiSetValue(complexDescriptor, DFTI_BACKWARD_SCALE, (double) 1 / numElements);
DftiCommitDescriptor(complexDescriptor);
DftiComputeForward(complexDescriptor, compDataArray);
DftiFreeDescriptor(&complexDescriptor);

I have run on both a core2 quad and duo machine running Win XP Pro SP 3. On neither do I see 100 % CPU usage. On the quad, only 2 cores seem to show increased activity.

I inherited this project and I believe I am working with MLK 5.1

Any help would be useful (including how to determine the exact version of the library)

Thanks

mimarch,
1) The current version of MKL is 10.2 Update3 but MKL 5.1 - EOL some years ago,....

2) Starting with Intel MKL 10.0, the OpenMP* software determines the default number of threads. For Intel OpenMP* libraries, the default number of threads is equal to the number of logical processors in your system.
At the same time all MKL's versions before 10.0 ( namely 9.1, 8.1, 7.2 and etc ) support another default threading model: the default number of threads is equal 1.

3) how to determine the exact version of the library?
there are at least 2 ways to define the version:
3.1:please loo at the docmklsupport.txt file. you can find there the packageID data ( like at my installation -- Package ID: w_mkl_p_10.2.3.029 )
that's mean: version 10.2 Update3 build 029
3.2 use runtime routine
mkl_get_version( MKLVersion* pVersion ); See the description into manual or into header files

one notes:
for the obsolete versions there is MKLGetVersion() routine. This is an obsolete name for the mkl_get_version function

4) please look at the Intel MKL Treaded 1D FFT. May be it will usefull for you.

--Gennady

Marshall__Michael_B · ‎01-08-2010

I found a more recent version in our company's archives.

I am now using 9.1.026

I am using the exact same code; headers still matched.

The new library does have better performance, but I still don't see both processors pegged.

I am running 32 bit Win XP Pro SP 3.

I found the following snippet in the new documentation.

Intel MKL is threaded in a number of places: direct sparse solver, LAPACK (*GETRF,
*POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR routines), all Level 3 BLAS,
Sparse BLAS matrix-vector and matrix-matrix multiply routines for the compressed sparse
row and diagonal formats, and all FFTs (except 1D transformations when
DFTI_NUMBER_OF_TRANSFORMS=1 and sizes are not power of two).

NOTE. For power-of-two data in 1D FFTs, Intel MKL provides parallelism
only for processors based on IA-64 (Itanium processor family) or
Intel 64 architecture. In the latter case, the parallelism is provided for
out-of-place FFTs only.

I am running on a Core 2 Duo which I think is Intel 64 architecture. Do I need to run 64 bit Windows and use the ia64 libraries to parallelize my 1D FFTs?

Thanks again for the help

Gennady_F_Intel · ‎01-08-2010

Quoting - mimarsh2

I found a more recent version in our company's archives.

I am now using 9.1.026

I am using the exact same code; headers still matched.

The new library does have better performance, but I still don't see both processors pegged.

I am running 32 bit Win XP Pro SP 3.

I found the following snippet in the new documentation.

Intel MKL is threaded in a number of places: direct sparse solver, LAPACK (*GETRF,
*POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR routines), all Level 3 BLAS,
Sparse BLAS matrix-vector and matrix-matrix multiply routines for the compressed sparse
row and diagonal formats, and all FFTs (except 1D transformations when
DFTI_NUMBER_OF_TRANSFORMS=1 and sizes are not power of two).

NOTE. For power-of-two data in 1D FFTs, Intel MKL provides parallelism
only for processors based on IA-64 (Itanium processor family) or
Intel 64 architecture. In the latter case, the parallelism is provided for
out-of-place FFTs only.

I am running on a Core 2 Duo which I think is Intel 64 architecture. Do I need to run 64 bit Windows and use the ia64 libraries to parallelize my 1D FFTs?

Thanks again for the help

yes,
- the building line will be like: ifort test.f mkl_em64t.lib libguide40.lib
- Please remember about the default behaivior for this version of MKL:
The library uses OpenMP* threading software, which responds to the environmental variable OMP_NUM_THREADS that sets the number of threads to use. If the variableOMP_NUM_THREADS is not set, Intel MKL software will run on one thread. It is recommended that you always set OMP_NUM_THREADS to the number of processors you wish to use in your application.

- and the another Tips for Coding Techniques for FFT functions :

There are additional conditions to gain performance of the FFT functions.

Applications based on IA-32 or Intel 64 architecture. The addresses of the first elements of arrays and the leading dimension values, in bytes (n*element_size), of two-dimensional arrays should be divisible by cache line size, which equals

32 bytes for Pentium III processor

64 bytes for Pentium 4 processor

128 bytes for processor using Intel 64 architecture

--Gennady