- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Hello,
The attached code is an FFT implementation. It should work on any input vector length
I converted each for loop into a clik_for loop.
When I did it in line "for (i = 1; i < n; i += 2)" the results of the FFT were wrong.
There is also a probem with the while loops.
I tried to convert: "while (n > mmax)" to "cilk_for (;n>mmax;)" but I got a syntax error: missing control variable declaration and initialization.
What is the right way to convert a regular FFT code to a clik code ?
Thanks,
Zvika
Ссылка скопирована
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I don't see any cilk_for in the attachment. Did you hope for "clik_for" to be so interpreted? How could it compile successfully?
As you saw, cilk_for is strict about requiring all the fields to be filled in in such a way that the loop is countable, and none of the parameters are modified elsewhere in the loop. It's a parallelization construct, so need not accept anything which is not parallelizable.
Are you certain that you want nested parallelism as your first step?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
One of your loops has a race --- each iteration of the following loop modifies the same j.
j = 1;
cilk_for (int i = 1; i < n; i += 2) {
if (j > i) {
tempr = data
data
data = tempr;
tempr = data[j+1];
data[j+1] = data[i+1];
data[i+1] = tempr; }
m = n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
There could be other similar errors in your program --- I didn't check the rest of the program too carefully.
Using Cilk screen to race-detect your program may help eliminate some of these errors.
More generally, I would also question whether implementing your own FFT is actually the most efficient way to solve your problem. I would think that using an existing optimized library would likely be more efficient in terms of both performance and development effort.
Cheers,
Jim
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
The code has races on m, j,wr, wi, tempr, tempi. The example demonstrates a common difficulty with parallelizing serial code: the serial code makes heavy use of induction variables, that is variables whose value is incrementally updated on each iteration based on the variable's value in the previous iteration. For example, wr/wi are being computed based on their values in the previous iteration of the m-loop. You'll need to change the calculation to compute their value's from scratch. Typically production-strength FFTs precompute a lookup table for these values (often called "twiddle factors"). Likewise the calculations of m and j need to be done in a way that does not depend on their previous values.
Some general advice: always declare variables in the innermost scope possible. For example, tempr should be declared inside the two blocks that it is used. That will eliminate the race on tempr. The race on tempi can be eliminated in a similar way.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Dear Members,
I will search a simpler C code and try to "cilk" it.
Thanks,
Zvika
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом

- Подписка на RSS-канал
- Отметить тему как новую
- Отметить тему как прочитанную
- Выполнить отслеживание данной Тема для текущего пользователя
- Закладка
- Подписаться
- Страница в формате печати