With VTune, if you time by Event Based Sampling, at a reasonable sampling rate, you are correct about not impacting the program but getting only approximations on the time spent. When you get past a few hundred samples in the region of interest, the timing is as accurate as any other method.
Timers such as QueryPerformance or rdtsc may be superior in those cases where you want to time a designated section of code at the expense of a hundred or so bus clocks. VTune, on the other hand, doesn't require you to know where the hot spots are; it will help you find them, and it offers the ability to select event timers in an effort to find out how the time is spent.