I compiled j2kit (both x86 and x64 version) with Intel compiler 10.1.020 using the same compiler switches with LINKAGE=static, MULTITHREADING=omp with and without defining TIMING.
Without TIMING, it takes ~1,500 ms for 3328 x 4992 BMP to get compressed into jp2 using default compression ratio (command line: j2kit -i test.bmp -o test.jp2). I measured the elapsed time using stopwatch, repeated the test three times and picked the fastest run.
With TIMING=1, it takes ~4,100 ms for the same image. That is 2.5x slowdown!!! Note that I have checked LINKAGE (static .vs. dynamic) and it doesn't matter, only TIMING is affecting the result.
What would be the reason for such a huge performance drop considering that the timing code is called only twice, once at the beginning of encode/decode and once at the end?
Should I perhaps cross-post this to the Intel Compiler forum (perhaps the Timer.h interferes with code optimization)?
Or is that perhaps just ippGetCpuFreqMhz() wasting ~2,500 ms at startup in order to (incorrectly) determine actual CPU frequency and somehow (again incorrectly) adding to the running time of the timed application?
Do you try to measure JPEG decoding performance at which system? Do you try to measure its performance in multithreading environment? The JPEG2000 sample actually provides the timing feature and here is an example to illustrate howto measure JPEG2000 decoding/encoding performance in mulitthreaded system:
step 1: >build32 icl10 dynamic omp timing ( Intel Compiler and OpenMP enabling for measuring timing)
step 2: set OMP_NUM_THREADS=1
step 3: JPEG2000 encoding: j2kit.exe -i ole0.bmp -o ole0.jp2 -linf-
repeat several times and average the time.
step 4: set OMP_NUM_THREADS=2
step 5: repeat step 3 several times
Get comparison data from 1thread to 2 threads for J2K encoding
step 6: set OMP_NUM_THREADS=1
step 7: JPEG2000 decoding performance: j2kit.exe -i ole0.jp2 -o tmp.bmp -linf-
step 8: set OMP_NUM_THREADS=2
step 9" repeat step 7 several times
Get comparison data from 1thread to 2 threads for J2K decoding.
If you still observer slow perofrmance, please contact us by submitting an issue in Intel Premier Support with including the image data and detailed system configuration.
I did not try to measure JPEG decoding performance, only JPEG2000 encoding.
As I wrote in my post above, I did test multi-threading performance. My full hardware configuration is listed here.
I am well aware that "JPEG2000 sample actually provides the timing feature" — for some reason that timing feature is exactly the cause of the decoding slowdown I have reported.
Please read my first post more carefully so I don't have to repeat myself.