MKL DCT using fortran: scaling prefactors and complex-type arrays

Thomas_G_7 · ‎10-24-2015

Hi,

I'm currently using the MKL to calculate some DCT on complex-type fortran arrays. My fortran module has a first initialization routine that looks like

   subroutine initialize(this,n, ni, nd)
     class(costf_odd_t) :: this
     !-- Input variables 
     integer, intent(in) :: n 
     integer, intent(in) :: ni 
     integer, intent(in) :: nd 
     !-- Local variables: 
     integer :: stat 
     real(cp) :: fac 
     real(cp) :: work(n) 
     this%nRad = n 
     allocate(this%i_costf_init(128)) 
     allocate(this%d_costf_init(1:5*(n-1)/2+2)) 
     call d_init_trig_transform(n-1,MKL_COSINE_TRANSFORM,this%i_costf_init,this%d_costf_init,stat)
     call d_commit_trig_transform(work,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
     !fac = sqrt(half*real(this%nRad-1,cp)) 
     !stat = DftiSetValue(this%r2c_handle, DFTI_FORWARD_SCALE, fac) 
     !stat = DftiCommitDescriptor(this%r2c_handle) 
  end subroutine initialize

The actual DCT is then computed later on as follows

   subroutine costf1_complex(this,f,n_f_max,n_f_start,n_f_stop) 
     class(costf_odd_t) :: this
     !-- Input variables: 
     integer,  intent(in) :: n_f_start,n_f_stop ! columns to be transformed 
     integer,  intent(in) :: n_f_max 
     complex(cp), intent(inout) :: f(n_f_max,this%nRad)  
     !-- Local variables: 
     real(cp) :: work_real(this%nRad) 
     real(cp) :: work_imag(this%nRad) 
     integer :: stat, n_f 

     do n_f=n_f_start,n_f_stop 
        work_real(:) = real(f(n_f,:)) 
        work_imag(:) = aimag(f(n_f,:)) 
        call d_forward_trig_transform(work_real,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
        call d_forward_trig_transform(work_imag,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
        f(n_f,:)=sqrt(half*real(this%nRad-1,cp))*cmplx(work_real,work_imag,kind=cp) 
     end do 
  end subroutine costf1_complex

It basically does what I want but I have several performance limitations since: (i) there is a pre-factor multiplication which is computed for each DCT, (ii) there is a memory copy due to the unsupported complex-type input arrays in the MKL trig transforms.

I tried fixing the first issue using the 3 last commented lines in the initialize subroutine above, but it didn't work, any idea here? Concerning the second issue, is there a possible way to avoid the memory allocation of the work arrays in the costf1_complex subroutine (maybe using the C-type pointers)?

Thanks!

Ying_H_Intel · ‎10-25-2015

Hi Thomas,

(i) there is a pre-factor multiplication which is computed for each DCT, (ii) there is a memory copy due to the unsupported complex-type input arrays in the MKL trig transforms

I) you mean the factor is fac = sqrt(half*real(this%nRad-1,cp)) and

II)there are

	
				work_real(:) = real(f(n_f,:))

		
				14
				      work_imag(:) = aimag(f(n_f,:))
			

	 
Right?  
Because of the below reason,  
1) trig function ask continous input array
2) it don't factor option, you can't pin the pre-factor to internal FFT Descriptor. 
Your current code looks the right solution that MKL trig transforms can support so far. 
or other consideration, 
1) like create feature request ask complex DCT support 
2) your algorithm,   for example, 
if there is the possible to change the input data, so no copy needed? 
the possible for use MKL FFT to caculate your DCT?
it seems do the row by row DCT. The pre-defined is fixed between each row, you may put 
f(n_f,:)=sqrt(half*real(this%nRad-1,cp))*cmplx(work_real,work_imag,kind=cp)
out of the do loop and call some function like cblas_zscal. to improve the speed. 
Best Regards,

		Ying