I am trying to observe the performance level of the IPP, for decoding purposes. I installed all the IPP package correctly, and was able to do some encodin/decoding tests with different file formats.
Any ways, what I am interested is to see how IPP performs for decdoing JP2 stream.
I have an ASUS G74S computer, which has 8 Intel cores: Intel(R) Cpre(TM) i7-2630QM CPU @ 2.00 GHz
The file under test is an encoded jp2 stream, and the file size is about 230KB (and just for sake of information, it is a1920x1080 size image).
What is interesting is that I did a run with the advanced timing option, and looping about 30 times (whcih essentialy, is looping around the decoder function).
Well, it is worth to mention at this point, that line 741 of "uic_transcoder_con.cpp" reports the time per loop (decTime = msec / cmdOptions.loops), but I wanted to see the total time, so I removed the denominator (decTime = msec).
But, suprisingly, for the image size that I am testing with, it takes about 18 seconds to decode my image 30 times !!!
However, when I tried to do a comparison, with J2K-Codec, decoding the same image for 30 times, takes only about 4 seconds !!!
I don't think IPP should be this slow compared to J2K-Codec, but I am not sure what I am doing wrong?
Just as a test, I tried to open up my VC++ 2010, and under proerties, and under "Intel Performance Libraries", I set "Use IPP" to "NO"
And I was expecting to get a lot more slower result due to using pure CPU power. However, surprislngly, when I ran the exact same test, I got about the same number (18 sec) !
So, I am speculating this probably means that it didn't even used IPP at the first place. Out of curiosity, when I set "Use IPP" to "Multi-Threaded Static Library", it failed to compile, and it may make sense, since I may not have the proper libraries for multi-threading.
But, when I set it back to "Single-threaded Static Library", it compiles, and runs fine, but as I mentioned, it takes about 18 second, which is supper slow !!!
Did I forget to set something properly? Is this using the hardware accelerated primitives at all? If it deos use the hardware acceleration, then is this supposed to be this slow?! Do I need to do something to ensure that hardware acceleration will be fully utilized ?
Yesterday, I was trying in Debug configuration. So, I changed to Release configuration and gave it a try.
It is obvioulsy faster than Debug configuration.
So by siwtching to Release configuration, using the exact same test, which would take about 18 sec ~ 20 sec in Debug. Now it takes about 9 sec ~ 11 sec in Release.
But this is what I get whether I am using IPP or not !!
In the property page, considering I am in Release configuration, whether I select Use IPP = NO or Use IPP = Single-threaded Static Library, the performance doesn't change !!!
By the way, I tried using both optimization /O2 and /Ox, (I don't see /O3 as an option)
I am still not sure if IPP is truly being kicked off (Unless compiling w/wo IPP in Release, wouldn't be any differenct, which wouldn't really make sense!)
Any idea why I don't see a performance difference w/wo IPP?
Okay I just added /verbose flag to the command line's Additional Options, both under Debug and Release configuration, but I don't see anything happening when I run the test !
For example, under Debug, this is what my Additional Options looks like now " /machine:x64 /debug /VERBOSE",
But I don't see any messages or anything different while running the test !!!
No, when I build the code, I get nothing at all !!!
Here are some updates:
In my Linker properties:
under Linker --> Input, I can see that all the dependencies are there:
And, under Linker-->General, this is what is listed as Additional Library Directories:
C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64;C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64/$(Configuration);;C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64;C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64/$(Configuration);%(AdditionalLibraryDirectories)
I am not sure if any Library Directory should be listed from ipp-samples.126.96.36.199.013, or not?! But currently there is non!
By the way, just a note, that I didn't modify any of these.
Also, again, under Linker--> General I am not sure if Link Library Dependencies suppossed to be turned on or not?
Because by default, this was set to NO, and when I changed this to Yes, then I got some errors during compiling:
Error 4 error LNK1169: one or more multiply defined symbols found
Error 2 error LNK2005: ippSetNumThreads already defined in ippcore.lib(ippcore-7.1.dll)
Error 3 error LNK2005: ippStaticInit already defined in ippcore.lib(ippcore-7.1.dll)
So, I set it back to NO.
Also, one thing that doesn't make sense is that under Linker-->Advanced, Import Library was set to:
First, I thought that the path for uic_transcoder_con.lib is wrong !
Under "samples.7.1.1.013/__cmake/uic.intel64.vc2010.d.mt/application/uic_transcoder_con/debug" there is NO uic_transcoder_con.lib
In fact, I did a search under entire "C:\w_ipp-samples_p_7.1.1.013\ipp-samples.7.1.1.013" for uic_transcoder_con.lib, thinking it may be some other place, but I wasn't able to find this particualar file!!
LINK.EXE will ignore /IMPLIB if no exports are declared. I the case of this project none are declared (as I see in the project on my system).
For reference here are my compile and link settings. Everythings builds correctly on my system.
/GS /TP /W3 /Zc:wchar_t /I"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/application/uic_transcoder_con/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/bmp/common/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/bmp/dec/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/bmp/enc/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/png/dec/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/png/enc/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg/common/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg/dec/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg/enc/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpegxr/dec/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpegxr/enc/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg2000/dec/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg2000/enc/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/codec/image/jpeg2000/common/src/common" /I"C:/IPP_Samples_7.1.0.011/sources/uic/io/uic_io/include" /I"C:/IPP_Samples_7.1.0.011/sources/uic/core/uic/include" /Zi /Gm- /Od /Fd"C:/IPP_Samples_7.1.0.011/__cmake/uic.intel64.vc2012.d.st/__bin/debug/uic_transcoder_con.pdb" /fp:fast /D "WIN32" /D "_WINDOWS" /D "_DEBUG" /D "_SBCS" /D "_WIN32" /D "_WIN32_WINNT=0x501" /D "WIN64" /D "_WIN64" /D "CMAKE_INTDIR=\"debug\"" /errorReport:prompt /WX- /Zc:forScope /GR /Gd /MDd /Fa"debug" /EHsc /Fo"uic_transcoder_con.dir\debug\" /Fp"uic_transcoder_con.dir\debug\uic_transcoder_con.pch"
/OUT:"C:\IPP_Samples_7.1.0.011\__cmake\uic.intel64.vc2012.d.st\__bin\debug\uic_transcoder_con.exe" /MANIFEST /NXCOMPAT /PDB:"C:/IPP_Samples_7.1.0.011/__cmake/uic.intel64.vc2012.d.st/__bin/debug/uic_transcoder_con.pdb" /DYNAMICBASE "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "comdlg32.lib" "advapi32.lib" "..\..\__lib\debug\uic_io.lib" "..\..\__lib\debug\core_uic.lib" "..\..\__lib\debug\bmp_dec.lib" "..\..\__lib\debug\bmp_enc.lib" "..\..\__lib\debug\jpeg_common.lib" "..\..\__lib\debug\jpeg_dec.lib" "..\..\__lib\debug\jpeg_enc.lib" "..\..\__lib\debug\jpegxr_common.lib" "..\..\__lib\debug\jpegxr_dec.lib" "..\..\__lib\debug\jpegxr_enc.lib" "..\..\__lib\debug\jpeg2000_common.lib" "..\..\__lib\debug\jpeg2000_dec.lib" "..\..\__lib\debug\jpeg2000_enc.lib" "..\..\__lib\debug\png_common.lib" "..\..\__lib\debug\png_dec.lib" "..\..\__lib\debug\png_enc.lib" "..\..\__lib\debug\zlib.lib" "ippcore.lib" "ippi.lib" "ippj.lib" "ipps.lib" "ippcc.lib" "ippdc.lib" "ippch.lib" "svml_dispmt.lib" "libircmt.lib" "libiomp5md.lib" /STACK:"10000000" /IMPLIB:"C:/IPP_Samples_7.1.0.011/__cmake/uic.intel64.vc2012.d.st/application/uic_transcoder_con/debug/uic_transcoder_con.lib" /DEBUG /MACHINE:X64 /INCREMENTAL /PGD:"C:\IPP_Samples_7.1.0.011\__cmake\uic.intel64.vc2012.d.st\__bin\debug\uic_transcoder_con.pgd" /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"uic_transcoder_con.dir\debug\uic_transcoder_con.exe.intermediate.manifest" /ERRORREPORT:PROMPT /NOLOGO /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64/debug" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64/debug" /TLBID:1
I just compared your Link/Compiler options one by one with mine.
There were almost no differences. Considering that if /IMPLIB will certainly get ignored, and there is no issue there (since as I said, I don't have uic_transcoder_con.lib), I don't see any difference between your Link/compile setting vs. mine (well except the orders)
The only difference is for me, I have /errorReport:queue and you have /errorReport:prompt. And that is it.
Here is mine just in case
My Compile Options:
/I"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/application/uic_transcoder_con/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/bmp/common/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/bmp/dec/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/bmp/enc/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/png/dec/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/png/enc/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg/common/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg/dec/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg/enc/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpegxr/dec/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpegxr/enc/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg2000/dec/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg2000/enc/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/codec/image/jpeg2000/common/src/common" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/io/uic_io/include" /I"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/sources/uic/core/uic/include" /Zi /nologo- /W3 /WX- /O2 /D "WIN32" /D "_WINDOWS" /D "_DEBUG" /D "INTEL64" /D "WINDOWS" /D "_SBCS" /D "_WIN32" /D "_WIN32_WINNT=0x501" /D "WIN64" /D "_WIN64" /D "CMAKE_INTDIR=\"debug\"" /Gm- /EHsc /MDd /GS /fp:fast /Zc:wchar_t /Zc:forScope /GR /Fp"uic_transcoder_con.dir\debug\uic_transcoder_con.pch" /Fa"debug" /Fo"uic_transcoder_con.dir\debug\" /Fd"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/__cmake/uic.intel64.vc2010.d.mt/__bin/debug/uic_transcoder_con.pdb" /Gd /TP /errorReport:queue
My Linker Options:
/OUT:"C:\w_ipp-samples_p_7.1.1.013\ipp-samples.7.1.1.013\__cmake\uic.intel64.vc2010.d.mt\__bin\debug\uic_transcoder_con.exe" /VERBOSE /INCREMENTAL /NOLOGO /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/lib/intel64/debug" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64" /LIBPATH:"C:/Program Files (x86)/Intel/Composer XE 2013/ipp/../compiler/lib/intel64/debug" "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "comdlg32.lib" "advapi32.lib" "..\..\__lib\debug\uic_io.lib" "..\..\__lib\debug\core_uic.lib" "..\..\__lib\debug\bmp_dec.lib" "..\..\__lib\debug\bmp_enc.lib" "..\..\__lib\debug\jpeg_common.lib" "..\..\__lib\debug\jpeg_dec.lib" "..\..\__lib\debug\jpeg_enc.lib" "..\..\__lib\debug\jpegxr_common.lib" "..\..\__lib\debug\jpegxr_dec.lib" "..\..\__lib\debug\jpegxr_enc.lib" "..\..\__lib\debug\jpeg2000_common.lib" "..\..\__lib\debug\jpeg2000_dec.lib" "..\..\__lib\debug\jpeg2000_enc.lib" "..\..\__lib\debug\png_common.lib" "..\..\__lib\debug\png_dec.lib" "..\..\__lib\debug\png_enc.lib" "..\..\__lib\debug\zlib.lib" "ippcore.lib" "ippi.lib" "ippj.lib" "ipps.lib" "ippcc.lib" "ippdc.lib" "ippch.lib" "svml_dispmt.lib" "libircmt.lib" "libiomp5md.lib" /MANIFEST /ManifestFile:"uic_transcoder_con.dir\debug\uic_transcoder_con.exe.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/__cmake/uic.intel64.vc2010.d.mt/__bin/debug/uic_transcoder_con.pdb" /SUBSYSTEM:CONSOLE /STACK:"10000000" /PGD:"C:\w_ipp-samples_p_7.1.1.013\ipp-samples.7.1.1.013\__cmake\uic.intel64.vc2010.d.mt\__bin\debug\uic_transcoder_con.pgd" /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:/w_ipp-samples_p_7.1.1.013/ipp-samples.7.1.1.013/__cmake/uic.intel64.vc2010.d.mt/application/uic_transcoder_con/debug/uic_transcoder_con.lib" /MACHINE:X64 /ERRORREPORT:QUEUE
Since, there is no difference, I am assuming I have everything working correctly. I still haven't figured out why in /VERBOSE nothing gets displayed. Maybe it is an issue with VC++. I am not sure. Cause I even tried "Linker-->General" I set Show Progress --> "Display all progress messages (/VERBOSE)". Aslo, under tools->options->Projects and SOlutions->Build and Run, change the last 2 settings for output from 'Minimal' to 'Diagnostic' verbosity, and still nothing gets displayed.
Any idea then why w/wo IPP I get the same performance. Something still tells me is not correct, because my test, if it is really using hardware, shouldn't take this long !!!
Cause we would really like to replace our current decoder, with the IPP decoder if it proves performance improvement.
By the way, is your Link Library Dependencies set to "NO", under linker general option?
And, can you try to test something very quick by turning your USE IPP to both Single-threaded Static Library and "No" settings?
I don't know, maybe the fact that under configuration properties, "Intel Performance Libraries" is visible, it could mean that regardless of what your setting for USE IPP is, it will alaways going to use IPP, and maybe that is why I don't see a difference when I compile w/wo IPP ? That is why I would be cusious to see whether your run time would actually vary if you turn that option ON or OFF !
Yes, my Link Library Dependencies is set to "No". Both USE IPP options produce no errors when I build UMC. I think there is no difference between these options though because in the linker "Additional Dependencies" tab I see that the project is linking to ippcore.lib, ippi.lib, ippj.lib, ipps.lib, ippcc.lib, ippdc.lib and ippch.lib.
In fact, another reasoning that supports the specualtion that hardware is not truely being used in my case is because the timing vaires a lot !
For example, when I try to run the same test let's say 10 times, I get a range of timing that vaires from 6 seconds all the way to 14 seconds ( I am in Release mode)!!! With the average being around 10 seconds.
Also, just being curious again, I unplugged my computer from power, and that certainly shifts the average run time from 10 seconds to somewhere between 12~13 seconds.
Then does this mean the performance that I am getting is the real hardware perofrmance? But it doesn't really make sense if I am getting such a huge variance in the timing?
Shouldn't we expect more of a real-time operation in hardware? Becuase, if that is the case, we shouldn't really see this much timing variacne!
For example, with our current decoder (J2k-Codec), we are very consistent as far as timing goes. For the exact same test, it is always between 3.8 seconds and 3.9 seconds. (the variance is bounded to the tenth of a second)
I am anticipating a very small vairance when using IPP! Unless if I still have something not set up properly!
I know some people have used IPP jpeg2000 decoder and they were very happy with the performance.
So, could it be that I am still doing something wrong? But it seems that I have the configuration setup all correct. Could it be that they actually modified the decoder code, and that is why there were able to get a satisfactory result?