Good report, jceberus. Clear and fact filled.
Of course, you may have founda bug, but I don't think so. More likely, the threads (processes) that are being created aren't statistically relevent, and therefore, aren't showing up.
Check it out: if you just sample the system for say, 20 seconds (-d 20) without launching an application, you will absolutely NOT get the same results as if you did
# ps -ef
VTune samples, it doesn't count precisely. So, what can you try? A couple things:
1) Increase the duration of the sampling activity by a bunch, by ten or a hundred times. So, if you're going 10 seconds make it 10 minutes. Something like that.
2) Read about, and consider changing your "sample after" value for the events that you're looking for. In the case of the default instructions retired, you might experiment with gradually smaller and smaller values, see which is optimal based on the results you see.
3) Instead of doing 2) above by hand, guessing, ask vtl to do it: turn calibration on. See if your results vary.
Report back, let us know here what you see!