Vtune shows improvement, but actual timing says otherwise
I've been doing some optimization experiments on several functions in my app and monitoring the results with VTune v6.1. Recently, while trying to improve the performance of one of my functions I unrolled it twice. According to VTune sampling this decreased the functions clockticks by about half. However, when I ran the app and timed it with a stopwatch, it was considerably slower! How can this happen? Only the unroled function was changed, nothing else. Thanks.
I share your confusion. The normal things to check are that the system wasn't doing other things at the same time during the unroll test, verifying that the VTune Analyzer sampling rate is the same (Configure->Modify->Events->SampleAfterForClockticks), the system memory and configuration are the same, input data set is the same, disk and memory caches are in the same state (has the program already run and caches are warmed up with data)... It is the case that if all things are the same, there should be a better correlation between execution time and clockticks.