Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Question about unexpected performance when using icc for TSVC benchmark

susangao
Beginner
614 Views
Hi all,
I got a TSVC benchmark fromhttp://polaris.cs.uiuc.edu/ maleki1/TSVC.tar.gzIt is used to evaluate the vectorization of compilers. I am trying to test icc with it.
I changed the makefile to use icc and I am trying to see the timing of it.
The code is simple, there are 151 main functions, each of them has a typical loop and it's own initialization. Each of them prints out it's timing and a certain correctness checking result. main() function calls these 151 functions.
I found something confusing happened for function s162(). In main() function, if I comment out all function calls behind s162(), then it's timing is 2.x sec on e5-2680; but if I do not comment them and just use the original code, it timing is 4 sec. This can be repeat and is not randomly happened.
It doesnt matter whether I keep or comment those function calls before s162(). It is the point that confuse me since only later part after s162() matters.
I compared the .s of these s162(), seems there no instruction related difference.(The correctness check result are the same for this two cases. )
The platform I am using is:
Chip: Xeon E5, 2680.
OS: GNU/Linux
ICC:icc (ICC) 12.1.3 20120212
This also happens when I run it on Xeon 5660. Timing is 3.x and 6.x. The platform info is:
Chip: Xeon 5660
OS:GNU/Linux
ICC:icc (ICC) 12.1.2 20111128
Attachment is the original pakage and the makefiles I used for above two platforms.
I wish get your precious advise on what the reason is. It looks like a silly question, but I hope I could learn from you and figure it out.
Thank you for reading.
Best Regards,
Susan
0 Kudos
8 Replies
TimP
Honored Contributor III
614 Views
This benchmark was not intended to be compiled with the driver and the test function in a single source file. Even 22 years ago (when the original was published), compilers could short-cut artificial benchmarks by inter-procedural optimization.

clock() is not a satisfactory timer for this benchmark. Whoever produced this modified version had to make it repeat far more than the original just to make it run long enough to be timed by clock(). Unfortunately, Standard C doesn't include a satisfactory timer function.

The point of s162() is to see whether the compiler recognizes the direction of data overlap, seeing that it will never be executed with a negative overlap. It's possible that when interprocedural optimization succeeds, the compiler sees that the overlap is a compile time constant and can eliminate the conditional as well as propagate the constant overlap into the code. There are other tests in this suite intended to concentrate on that.
0 Kudos
susangao
Beginner
614 Views
Got it. Thank you very much for your helpful and kindly reply.
0 Kudos
david_m_20
Beginner
614 Views

the link to http://software.intel.com/en-us/system/files/TSVC-s162.tar.gz is broken

can you provide an updated link

0 Kudos
SergeyKostrov
Valued Contributor II
614 Views
>>...I found something confusing happened for function s162(). In main() function, if I comment out all function calls behind s162(), >>then it's timing is 2.x sec on e5-2680; but if I do not comment them and just use the original code, it timing is 4 sec... It could be an alignment issue ( needs to be investigated ) and I've detected a similar issue with two of my performance evaluation tests for some SSE2 and AVX instructions. Almost the same thing, however opposite, that is, it gets better if I comment some pieces of codes. I really disagree with Tim's comment regarding a CRT-function clock since it provides satisfactory accuracy up to milli-seconds if some test runs more than a couple of seconds. Of course, if somebody will try to use the CRT-function clock to measure a time interval with micro- or nano-seconds accuracy it won't provide reliable numbers.
0 Kudos
TimP
Honored Contributor III
614 Views

In the original version of this benchmark http://www.netlib.org/benchmark/vectord loops of length 1000 are timed without extra repetitions.  The shorter loops are repeated so as to process as much data as the longer one, but repeating over the same cached data, Many of these tests run around 100 microseconds on 2.6Ghz coreI7-2, so a timer with microsecond resolution is needed.

I ran some tests this week on linux with various timers and got reports of microsecond resolution with Intel Openmp omp_get_wtime().  I believe it's nearly that good on Windows.  On linux, gettimeofday() is expected to work as well.

0 Kudos
SergeyKostrov
Valued Contributor II
614 Views
>>In the original version of this benchmark http://www.netlib.org/benchmark/vectord loops of length 1000 are timed >>without extra repetitions. The shorter loops are repeated so as to process as much data as the longer one, but repeating >>over the same cached data, Many of these tests run around 100 microseconds on 2.6Ghz coreI7-2, so a timer with >>microsecond resolution is needed. Thanks, Tim for these details.
0 Kudos
SergeyKostrov
Valued Contributor II
614 Views
>>In the original version of this benchmark [ ...link removed... ] loops of length 1000 are timed >>without extra repetitions. The shorter loops are repeated so as to process as much data as the longer one, but repeating >>over the same cached data, Many of these tests run around 100 microseconds on 2.6Ghz coreI7-2, so a timer >>with microsecond resolution is needed. Thanks, Tim for these details.
0 Kudos
Chari__Nihal
Beginner
614 Views
hi Can u please tell me what following result interpretae? What information does Checksum provide Loop Time(Sec) Checksum S421 2.34 32010.620068485 S1421 6.54 17208.404325315 S422 12.02 3.7377231414078 S423 4.03 32000.736895702 S424 2.45 32822.36069424
0 Kudos
Reply