- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

I'm currently using the MKL to calculate some DCT on complex-type fortran arrays. My fortran module has a first initialization routine that looks like

subroutine initialize(this,n, ni, nd) class(costf_odd_t) :: this !-- Input variables integer, intent(in) :: n integer, intent(in) :: ni integer, intent(in) :: nd !-- Local variables: integer :: stat real(cp) :: fac real(cp) :: work(n) this%nRad = n allocate(this%i_costf_init(128)) allocate(this%d_costf_init(1:5*(n-1)/2+2)) call d_init_trig_transform(n-1,MKL_COSINE_TRANSFORM,this%i_costf_init,this%d_costf_init,stat) call d_commit_trig_transform(work,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat) !fac = sqrt(half*real(this%nRad-1,cp)) !stat = DftiSetValue(this%r2c_handle, DFTI_FORWARD_SCALE, fac) !stat = DftiCommitDescriptor(this%r2c_handle) end subroutine initialize

The actual DCT is then computed later on as follows

subroutine costf1_complex(this,f,n_f_max,n_f_start,n_f_stop) class(costf_odd_t) :: this !-- Input variables: integer, intent(in) :: n_f_start,n_f_stop ! columns to be transformed integer, intent(in) :: n_f_max complex(cp), intent(inout) :: f(n_f_max,this%nRad) !-- Local variables: real(cp) :: work_real(this%nRad) real(cp) :: work_imag(this%nRad) integer :: stat, n_f do n_f=n_f_start,n_f_stop work_real(:) = real(f(n_f,:)) work_imag(:) = aimag(f(n_f,:)) call d_forward_trig_transform(work_real,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat) call d_forward_trig_transform(work_imag,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat) f(n_f,:)=sqrt(half*real(this%nRad-1,cp))*cmplx(work_real,work_imag,kind=cp) end do end subroutine costf1_complex

It basically does what I want but I have several performance limitations since: (i) there is a pre-factor multiplication which is computed for each DCT, (ii) there is a memory copy due to the unsupported complex-type input arrays in the MKL trig transforms.

I tried fixing the first issue using the 3 last commented lines in the initialize subroutine above, but it didn't work, any idea here? Concerning the second issue, is there a possible way to avoid the memory allocation of the work arrays in the costf1_complex subroutine (maybe using the C-type pointers)?

Thanks!

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Thomas,

(i) there is a pre-factor multiplication which is computed for each DCT, (ii) there is a memory copy due to the unsupported complex-type input arrays in the MKL trig transforms

`I) you mean the factor is fa`

`c = sqrt(half*real(this%nRad-1,cp)) and `

`II)there are `

`work_real(:) = ` `real` `(f(n_f,:))` |

`14` |
` ` `work_imag(:) = ` `aimag` `(f(n_f,:))` |

Right?

Because of the below reason,

1) trig function ask continous input array

2) it don't factor option, you can't pin the pre-factor to internal FFT Descriptor.

Your current code looks the right solution that MKL trig transforms can support so far.

or other consideration,

1) like create feature request ask complex DCT support

2) your algorithm, for example,

if there is the possible to change the input data, so no copy needed?

the possible for use MKL FFT to caculate your DCT?

it seems do the row by row DCT. The pre-defined is fixed between each row, you may put

`f(n_f,:)=`

`sqrt`

`(half*`

`real`

`(this%nRad-1,cp))*`

`cmplx`

`(work_real,work_imag,`

`kind`

`=cp)`

out of the do loop and call some function like cblas_zscal. to improve the speed.

Best Regards,

Ying

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page