IPP purchasing, integration and performance

meldaproduction · ‎09-22-2010

Hi folks,

I needed to search for an optimized FFT algorithm and so far IPP seems the fastest one. I checked it, integrated it in 32-bit environment, but I have following ideas:

1) PURCHASING

I need it for Win32, Win64 and Mac OS X. I'm quite lost in the purchasing options. First I'm not sure if buying it for Windows will work on both 32 and 64. Next it seems there is no option to purchase it separately for Mac OS X. Is it possible to get just the FFT? Or is it possible to get some package for both platforms?

2) PERFORMANCE

Besides FFT I also checked a few other routines, particularly scalar and complex multiplication, and unfortunately I found out that my own SSE based routines are MUCH faster. The reason for this is, I believe, I tried on often not 16-byte-aligned data. My implementation performs first scalar processing and when it reaches 16-byte boundary it goes for vector processing. It seems that Intel simply doesn't do that, which is pretty sad, because for example the multiplication is 4x slower than mine!
It just makes me think that what if all Intel routines are that shoddy? As a result I would definitely use only the FFT, therefore it becomes quite sad to buy and integrate everything just for that.

3) INTEGRATION

I use MSVC on Win32/64, so far integration on Win32 seems fine, so I guess Win64 should be fine as well. The question is what about Mac - I use GCC and I really don't want to spend a week just integrating the new stuff, because it seems I'd need Intel compiler as well, if I understand it correctly. So how does Intel compiler handle Mac libraries, I need just the basics - unix kernel (std, pthreads and stuff like that) and some stuff from Mac OS (Carbon, file services...). Also is it compatible with GCC linker? (just in case I'll have to go for the stupid Cocoa someday). Any other things to check?

4) INTEL C++ COMPILER PERFORMANCE

Are there any measurements or the generated code speed? I'm asking because I have already checked a few forums and the results are not very promising.

Regards,
Vojtech

Naveen_G_Intel · ‎09-22-2010

Hi,

See my answer for purchasing question,

1) PURCHASING

IPP for windows will work on 32 bit and 64 bit. IPP for Mac OS X is part of intel compiler professional edition, refer to this article - http://software.intel.com/en-us/articles/performance-tools-for-software-developers-how-can-i-download-the-intel-ipp-and-intel-mkl-for-mac-os-x/

Its not possible to get only FFT function package, you have to purchase complete IPP package either of windows/Linux or Man OS X(with Compiler Professional edition)

Thanks,

Naveen Gv

Naveen_G_Intel · ‎09-22-2010

3) INTEGRATION

Integration with MSVC 64 bit as well fine, additional information are provided in the Knowledge Base article

Also, it is compatible with GCC , refer to an article on how compile and link IPP for MAC OSX

meldaproduction · ‎09-23-2010

Ok thanks. So I tried the Win32 Intel compiler package just to check what will happen. First, it reported several warnings despite MSVC and GCC have no problems with the code. Anyway, I tried a project in MSVC 2005, converted to Intel compiler and - first it is VERY slow. Second, in debug mode the linker ended in some kind of infinite loop probably, because mcpcom.exe was running in the background allocating more and more memory, when after 10 minutes it exceeded I think 500MB I stopped it.
Then I tried release mode. Slow as well, this time it ended with this message:

>main.obj : error LNK2019: unresolved external symbol ___intel_sse2_strlen referenced in function "public: bool __thiscall Steinberg::FUID::fromString(char const *)" (?fromString@FUID@Steinberg@@QAE_NPBD@Z)
1>mlibrary.obj : error LNK2001: unresolved external symbol ___intel_sse2_strlen

I tried to manually add the library, running the emvironment variables batch, but nothing...

What should I do?

Naveen_G_Intel · ‎09-23-2010

Hi,

Sorry, I may not able to help you much on Intel compiler issues. There is a Intel C++ compiler forum(Windows, Linux and Mac OSX ), exclusively to discuss about C++ compiler related issues, could you submit it there?
http://software.intel.com/en-us/forums/intel-c-compiler/

Regards,

Naveen Gv

Vladimir_Dudnik · ‎09-23-2010

Hello,

I would expect some mistakes in your project settings. We've work with Intel Compiler for years, and frankly speaking, personally I would say it is the best optimizing compiler in the industry. Try the latest version, v12.0, you will see the difference. And by the way, when you work in Windows environment, I would recommend you to at least evaluate Intel Parallel Studio, especially, the latest version, 2011 XE Beta. Besides improved compiler it also support MSVC2010, contain the latest version of performance libraries, TBB, parallel debugger, inspector, adviser - a lot of staff I think is useful for development and collected together to bring the most value to developer.

Regards,
Vladimir

meldaproduction · ‎09-23-2010

Hi,

well, so I finally managed to resolve the problem - it was a collision of libraries between the compiler and IPP I installed previously. Though it is weird, because both are latest versions.

Anyway I'm now doing some tests. First, ICC is still EXTREMELY slow, especially the linker. Another big issue is that when I stop compilation in VC2005, it doesn't stop the process and the only way is to kill the compiler. Regarding the code - MSVC basically cannot generate SSE(2), I tried and it seemed working well and didn't help at all. In ICC I activated SSE3 optimizations and many others just to test it. And the results are quite sporadic to be honest. Some algorithms are faster, some slower, in all cases the difference is within +- 10% compared to MSVC.

All in all, FFT implementation in IPP is great. There is however nothing else that would make difference in the C++ compiler package so far. Well, I'll keep checking...

Cheers!

igorastakhov · ‎09-27-2010

Hi,

Can't believe that own implementation of cplx vector multiplication is 4x faster than IPP - I guess that something is wrong in your project with IPP library initialization - for example static linking is used but ippInit function is not called and you see performance of non-optimized generic code. Could you provide a reproducible buildable code example for cplx mul performance measurements or at least obtained performance numbers?

meldaproduction · ‎09-27-2010

No no complex multiplications are a little bit faster by IPP. More primitive actions such as simple multiplication of 2 vectors is sometimes much slower. I believe it is caused by the data not being aligned. Such situation can however be pretty easily handled, it's just boring work.