What is _vcomp_for_static_simple_init and why is it a hotspot?
Hi. I am profiling my code (64-bit) using Inspector XE and after making my reference counted objects "thread-safe" with , for example,
#pragma omp atomic refs++;
Inspector XE shows a (new) serious bottle-neck/hotspot in _vcomp_for_static_simple_init and kmpc_atomic_fixed4_add ( which calls _vcomp_for_static_simple_init)
I was under the impression that omp atomic pragmas generated efficient code for thread-safe operations. I was really shocked at these profile results as the code does many other operations for each reference +/- operation and I expected almost no effect on performance due to a omp atomic
Hi, The performance of an OpenMP atomic depends on how many threads are running. As the number of threads increases, its performance will generally decrease. That may be what you are seeing, or it may be another issue. Are you using Windows? If so, I would suggest you try an InterlockedIncrement() or InterlockedDecrement(), which use the atomic hardware on the processor instead of working on the application level. On Linux you might try using one of the GCC intrinsics such as __sync_add_and_fetch(). Also, you mention that you are profiling with Inspector XE - I assume you mean VTune Amplifier XE. Thanks, Shannon