- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The attached code is an FFT implementation. It should work on any input vector length
I converted each for loop into a clik_for loop.
When I did it in line "for (i = 1; i < n; i += 2)" the results of the FFT were wrong.
There is also a probem with the while loops.
I tried to convert: "while (n > mmax)" to "cilk_for (;n>mmax;)" but I got a syntax error: missing control variable declaration and initialization.
What is the right way to convert a regular FFT code to a clik code ?
Thanks,
Zvika
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any cilk_for in the attachment. Did you hope for "clik_for" to be so interpreted? How could it compile successfully?
As you saw, cilk_for is strict about requiring all the fields to be filled in in such a way that the loop is countable, and none of the parameters are modified elsewhere in the loop. It's a parallelization construct, so need not accept anything which is not parallelizable.
Are you certain that you want nested parallelism as your first step?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One of your loops has a race --- each iteration of the following loop modifies the same j.
j = 1;
cilk_for (int i = 1; i < n; i += 2) {
if (j > i) {
tempr = data
data
data = tempr;
tempr = data[j+1];
data[j+1] = data[i+1];
data[i+1] = tempr; }
m = n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
There could be other similar errors in your program --- I didn't check the rest of the program too carefully.
Using Cilk screen to race-detect your program may help eliminate some of these errors.
More generally, I would also question whether implementing your own FFT is actually the most efficient way to solve your problem. I would think that using an existing optimized library would likely be more efficient in terms of both performance and development effort.
Cheers,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code has races on m, j,wr, wi, tempr, tempi. The example demonstrates a common difficulty with parallelizing serial code: the serial code makes heavy use of induction variables, that is variables whose value is incrementally updated on each iteration based on the variable's value in the previous iteration. For example, wr/wi are being computed based on their values in the previous iteration of the m-loop. You'll need to change the calculation to compute their value's from scratch. Typically production-strength FFTs precompute a lookup table for these values (often called "twiddle factors"). Likewise the calculations of m and j need to be done in a way that does not depend on their previous values.
Some general advice: always declare variables in the innermost scope possible. For example, tempr should be declared inside the two blocks that it is used. That will eliminate the race on tempr. The race on tempi can be eliminated in a similar way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Members,
I will search a simpler C code and try to "cilk" it.
Thanks,
Zvika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page