- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The manual says, I can set DFTI_REAL as the forward domain in DftiCreateDescriptorDM(). However, the page for DftiComputeForwardDM() says that both the input and output are complex arrays.
Also, there are only complex-to-complex transforms among the examples.
Is the real-to-complex transform implemented for cluster FFT? If so, how do I obtain/calculate the local size of the forward domain?
Thanks,
Jozsef
Also, there are only complex-to-complex transforms among the examples.
Is the real-to-complex transform implemented for cluster FFT? If so, how do I obtain/calculate the local size of the forward domain?
Thanks,
Jozsef
Link Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jozsef,
Cluster FFT supports real-to-complex transforms.
Please use the following pseudo-code as a general guideline:
MKL_LONG n[]={10,20};
DftiCreateDescriptorDM(comm, &handle, DFTI_DOUBLE, DFTI_REAL, 2, n);
DftiCommitDescriptorDM(handle);
MKL_LONG size;
DftiGetValueDM(handle, CDFT_LOCAL_SIZE, &size);
double *inout = (double *)malloc(size*sizeof(double complex));
MKL_LONG nx;
DftiGetValueDM(handle,CDFT_LOCAL_NX,&nx);
MKL_LONG x_start;
DftiGetValueDM(handle, CDFT_LOCAL_X_START, &x_start);
// initialize input data as array of nx x n[1] doubles placed (with possible padding) in array of nx x (n[1]/2+1) double complexes
I hope this helps.
Best regards,
-Vladimir
Cluster FFT supports real-to-complex transforms.
Please use the following pseudo-code as a general guideline:
MKL_LONG n[]={10,20};
DftiCreateDescriptorDM(comm, &handle, DFTI_DOUBLE, DFTI_REAL, 2, n);
DftiCommitDescriptorDM(handle);
MKL_LONG size;
DftiGetValueDM(handle, CDFT_LOCAL_SIZE, &size);
double *inout = (double *)malloc(size*sizeof(double complex));
MKL_LONG nx;
DftiGetValueDM(handle,CDFT_LOCAL_NX,&nx);
MKL_LONG x_start;
DftiGetValueDM(handle, CDFT_LOCAL_X_START, &x_start);
// initialize input data as array of nx x n[1] doubles placed (with possible padding) in array of nx x (n[1]/2+1) double complexes
I hope this helps.
Best regards,
-Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Vladimir,
The above seems to work, i.e. it compiles and runs. However, the following simple test fails to give a reasonably small error on a single cpu. What am I missing?
Thanks,
Jozsef
The above seems to work, i.e. it compiles and runs. However, the following simple test fails to give a reasonably small error on a single cpu. What am I missing?
Thanks,
Jozsef
[cpp] const MKL_LONG N = 12; const MKL_LONG LENGTHS[3] = {N,N,N}; const MKL_LONG PO = N*N*N; double err, maxerr; double complex *U, *V, *W; DFTI_DESCRIPTOR_DM_HANDLE fft; MKL_LONG n, size, local_nx, local_x_start; MPI_Init(&argc,&argv); DftiCreateDescriptorDM( MPI_COMM_WORLD, &fft, DFTI_DOUBLE, DFTI_REAL, 3, LENGTHS); DftiGetValueDM(fft, CDFT_LOCAL_SIZE, &size); DftiGetValueDM(fft, CDFT_LOCAL_NX, &local_nx); DftiGetValueDM(fft, CDFT_LOCAL_X_START, &local_x_start); DftiSetValueDM(fft, DFTI_PLACEMENT, DFTI_INPLACE); DftiSetValueDM(fft, DFTI_FORWARD_SCALE, 1.0); DftiSetValueDM(fft, DFTI_BACKWARD_SCALE, 1.0/(double)PO); U = (double complex*)malloc(size*sizeof(double complex)); V = (double complex*)malloc(size*sizeof(double complex)); W = (double complex*)malloc(size*sizeof(double complex)); DftiSetValueDM(fft, CDFT_WORKSPACE, W); DftiCommitDescriptorDM(fft); double *u = (double*)U; double *v = (double*)V; for ( n = 0; n < PO; n++ ) u= v = (double)n; DftiComputeForwardDM(fft, U); DftiComputeBackwardDM(fft, U); maxerr = 0.0; for ( n = 0; n < PO; n++ ) { err = fabs(u -v ); if (err > maxerr) maxerr = err; } printf("maxerr = %gn",maxerr); free( U ); free( V ); free( W ); DftiFreeDescriptorDM(&fft); MPI_Finalize();[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interestingly, the above example works for a complex forward domain (and the loops going until 2*PO).
J
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jozsef,
I cannot run your code right now, but there is a couple of mistakes. Please find my comments/corrections below below:
Best regards,
-Vladimir
I cannot run your code right now, but there is a couple of mistakes. Please find my comments/corrections below below:
- constMKL_LONGN=12;
- constMKL_LONGLENGTHS[3]={N,N,N}; // comment: this are global sizes - not local
- constMKL_LONGPO=N*N*N; // comment: hence this is also global
- doubleerr,maxerr;
- doublecomplex*U,*V,*W;
- DFTI_DESCRIPTOR_DM_HANDLEfft;
- MKL_LONGn,size,local_nx,local_x_start;
- MPI_Init(&argc,&argv);
- DftiCreateDescriptorDM(MPI_COMM_WORLD,&fft,DFTI_DOUBLE,DFTI_REAL,3,LENGTHS);
- DftiGetValueDM(fft,CDFT_LOCAL_SIZE,&size); // comment: the size returned here is local to this process
- DftiGetValueDM(fft,CDFT_LOCAL_NX,&local_nx); // comment: this is how many DPs along the most strided dimension this process will own
- DftiGetValueDM(fft,CDFT_LOCAL_X_START,&local_x_start); // comment: this is at which DP the process's DPs along the most strided dimension start
- DftiSetValueDM(fft,DFTI_PLACEMENT,DFTI_INPLACE);
- DftiSetValueDM(fft,DFTI_FORWARD_SCALE,1.0);
- DftiSetValueDM(fft,DFTI_BACKWARD_SCALE,1.0/(double)PO);
- U=(doublecomplex*)malloc(size*sizeof(doublecomplex));
- V=(doublecomplex*)malloc(size*sizeof(doublecomplex));
- W=(doublecomplex*)malloc(size*sizeof(doublecomplex));
- DftiSetValueDM(fft,CDFT_WORKSPACE,W);
- DftiCommitDescriptorDM(fft);
- double*u=(double*)U;
- double*v=(double*)V;
- for(n=0;n
- u
=v =(double)n; // correction: the size obtained in line 14 is not neccessarily equal to P0 (please look at my previous post). You need to make it a doubly nested loop going over an array of local_nx x (LENGTHES[1] * (LENGTHES[2]/2 + 1)) reals - DftiComputeForwardDM(fft,U);
- DftiComputeBackwardDM(fft,U);
- maxerr=0.0;
- for(n=0;n
- {
- err=fabs(u
-v ); - if(err>maxerr)
- maxerr=err;
- }
- printf("maxerr=%g\n",maxerr);
- free(U);
- free(V);
- free(W);
- DftiFreeDescriptorDM(&fft);
- MPI_Finalize();
Best regards,
-Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. For more than one process, I would compute the end of the loops as "size / (N/2+1) * N", which is the number of doubles the given process owns -- as I understand.
This is the same as PO for a single process, since in that case local_nx = N. Correct?
With the above in mind, the behavior is the same.
J
This is the same as PO for a single process, since in that case local_nx = N. Correct?
With the above in mind, the behavior is the same.
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First, you correctly assume that for a single MPI proc (mpiexec -np 1 ...) local_nx will be equal to global N.
Next, size >= local_nx * N * (N/2 + 1) - for some transforms a larger buffer is required, hence the equality not necessarily holds.
Lastly, your data have to be layed out correctly (so called CCE layout) and that is done as follows:
double *input;
double complex *output = input;
int n2_padded = 2*(N/2 + 1);
int i0, i1, i2;
// init input
for (i0=0; i0 for (i1=0; i1 for (i2=0; i2 input[(i0*N + i1)*n2_padded + i2] = get_data_from_input_global_array_at(local_start_x + i0, i1, i2);
}
}
}
// do fft
// use the computed data
for (i0=0; i0
for (i1=0; i1
for (i2=0; i2
put_computed_data_into_output_global_array_at(local_start_x_out + i0, i1, i2) = output[(i0*N + i1)*(N/2 + 1) + i2] = ;
}
}
}
best regards,
-Vladimir
Next, size >= local_nx * N * (N/2 + 1) - for some transforms a larger buffer is required, hence the equality not necessarily holds.
Lastly, your data have to be layed out correctly (so called CCE layout) and that is done as follows:
double *input;
double complex *output = input;
int n2_padded = 2*(N/2 + 1);
int i0, i1, i2;
// init input
for (i0=0; i0
}
}
}
// do fft
// use the computed data
for (i0=0; i0
}
}
}
best regards,
-Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is fine, but I'm trying to get something simpler right first: on a single process (no assembly) and I'm simply trying to get the initial data back after a forward and a backward fft.
For example:
(1) Generate a bunch of random doubles into the NxNxN array U and make a copy in V.
(2) Do an in-place forward and backward FFT on U.
(3) Compare U and V, which should be the same (modulo floating point errors), irrespective of the (real, i.e. not complex) data storage between the transforms.
I tried this for non-cluster fft and it works for both real and complex forward domains. However, for cluster fft it only works for a complex domain and doesn't for a real one.
J
For example:
(1) Generate a bunch of random doubles into the NxNxN array U and make a copy in V.
(2) Do an in-place forward and backward FFT on U.
(3) Compare U and V, which should be the same (modulo floating point errors), irrespective of the (real, i.e. not complex) data storage between the transforms.
I tried this for non-cluster fft and it works for both real and complex forward domains. However, for cluster fft it only works for a complex domain and doesn't for a real one.
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jozsef,
First of all, Cluster FFT is not to be used on a single MPI process. It's the DFTI's (non-cluster FFT's) realm.
Second of all, it might be that this case does not even work with CFFT. I will double check and report back soon.
Best regards,
-Vladimir
First of all, Cluster FFT is not to be used on a single MPI process. It's the DFTI's (non-cluster FFT's) realm.
Second of all, it might be that this case does not even work with CFFT. I will double check and report back soon.
Best regards,
-Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. The problem was not Cluster FFT-related. Rather, I didn't correctly test the array in CCE format.
In particular, the loop
Instead of the above, I need something like
Thanks,
J
In particular, the loop
[cpp] maxerr = 0.0; for ( n = 0; n < PO; n++ ) { err = fabs(uis only fine for testing contiguous data, such as result from complex-to-complex transforms. The CCE format has some unused blocks, at the end of each row (2 doubles), which should not be compared.-v ); if (err > maxerr) maxerr = err; } printf("maxerr = %gn",maxerr);[/cpp]
Instead of the above, I need something like
[cpp] maxerr = 0.0; for ( n = 0; n < N*N; n++ ) for ( m = 0; m < N; m++ ) { err = fabs(u[n*2*(N/2+1)+m] - v[n*2*(N/2+1)+m]); if (err > maxerr) maxerr = err; } printf("maxerr = %gn",maxerr);[/cpp]which gives the correct small error. Now all this works for dfti, next is to get it right for cdft, as well.
Thanks,
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I suspected, the issue was the same for cluster fft.
J
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jozsef,
I'm glad to hear that the Good vanquished the evil (once again).
Best regards,
-Vladimir
I'm glad to hear that the Good vanquished the evil (once again).
Best regards,
-Vladimir
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page