Does it mean there is no support for these old VTune API functions here?
I don't know the sysadmin on this remote system so don't know exactly how things were installed there.
The task at hand is to determine whether OpenMP functions are spending much time in a specific omp for loop in a parallel region which scales well with problem size but runs slowly with a problem size which keeps about 32 threads out of the 64 busy. The original code timed the loop by restricting the timer to thread ID 0. I suspect #omp pragma restrict (or maybe master) may give more meaningful timing; something seems strange about the timing where there isn't sufficient work to keep all threads active. Still there seems to be too much time spent there, and -collect hpc-performance reports high serial time overall.
I could imagine that some more modern features of VTune might be more suitable, but I don't find enough detail in documentation.