- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hello Andrei,
The Intel Cluster Math Kernel Library 8.1can do distributed-memory, parallel FFT's. The following website has more information: http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/266852.htm. You can download Cluster MKL and get a 30-day license from this site too.
Please share your performance results if you try Cluster MKL.
Best regards,
Henry
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I looked at the performance graphs in the link in your reply.
How can one get 5 Gflops performance on 1.5 Ghz processor (I mean 1D FFT graph)?
Andrei
Message Edited by andrei@cox.net on 04-12-200607:53 AM
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Andrei,
The Itanium 2processor can dofour floating-point operations per clock cycle. Therefore, the theoretical peak of a 1.5 GHz Itanium 2 is 6 GFLOPS.
Henry
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I think that different graphs for the cluster performance estimate would be useful -- acceleration factor versus number of processors used for, say, 512x512 FFT. In fact, they are common in literature.
It is easy to understand from the existing graphs that such dependence is close to linear up to 4 processors. It is also clear that it will start to deviate from linear dependence at specific number of processors. Question -- how many and how fast?
Is there any place to take a look at such graphs?
Thanks, Andrei
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Andrei,
I'm not aware ofany other graphs or published benchmarks of MKL DFT performance. I can't estimate the scalability of your calculation. However, a 512x512 FFT is considered a small calculation on a good workstation or server. I recommend that you measure the serialMKLperformance before investing any effort in a distributed-memory, parallel solution. Depending on your system, MKL can probably compute a 512x512 transform in less than a second.Ifso, adistributed-memory, parallel solution will be slower because of the communication overhead.
Henry
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Thank you for your answers. In fact, 512x512 grid is just entry level
for my problem and I hope that system overhead will be significantly
lower for higher grid dimension.
Appreciate your cooperation, Andrei
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla