Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

parallel MKL DFTs

OP1
New Contributor III
1,024 Views

I am wondering what are the proper settings to be used for MKL DFT subroutines called from within a parallel region.

Assume that I have an array made of N_DATA series of data, each series having a length N_TIME (not including padding for the DFTs). At any given time in my program, DFTs must be calculated for a subset of these N_DATA series. For instance, during the first loop of my main program, it may be needed to calculate DFTs for the series 1 to 10. Then, for the second loop, the DFTs may have to be performed for the series 1, 23, 899 etc. In other words, the number of DFTs to be performed at each iteration of my main loop is not known in advance.

For each loop iteration, I still want the DFTs to be performed in parallel using all the CPUs of my system. I don't want to initialize my DFT descriptor for each iteration (it's too costly). The DFT descriptor must be initialized outside the loop, and therefore it is shared by all the DFT calls.

My question is: what is the setting I need to use for the NUMBER_OF_USER_THREADS when I create my descriptor? Do I need to set it to a large value, and hope that this value will not be exceeded by the actual number of threads gathered in the parallel region?

Here is a pseudo code illustrating what I am doing.

...
STATUS = DFTICREATEDESCRIPTOR(SHARED_DESCRIPTOR,DFTI_DOUBLE,DFTI_REAL,1,N_TIME)
STATUS = DFTISETVALUE(SHARED_DESCRIPTOR,DFTI_NUMBER_OF_TRANSFORMS,1)
...
STATUS = DFTICOMMITDESCRIPTOR(SHARED_DESCRIPTOR)
DO J=1,1000
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I)
!$OMP DO
DO I=1,N_LOOPS(J)
STATUS=DFTICOMPUTEFORWARD(SHARED_DESCRIPTOR,X(I_START(I,J):I_END(I,J)))
ENDDO
!$OMP END DO NOWAIT
!$OMP END PARALLEL
ENDDO
...
STATUS=DFTIFREEDESCRIPTOR(SHARED_DESCRIPTOR)

Thanks in advance for your help!

Olivier

0 Kudos
1 Reply
Dmitry_B_Intel
Employee
1,024 Views

Olivier,

If you know how many parallel threads are going to use the shared descriptor, you need to set DFTI_NUMBER_OF_USER_THREADS to this value before commiting the descriptor. Presumably, this number is the value returned byomp_get_num_procs().

Regards
Dima

0 Kudos
Reply