I have looked through the forums and other TBB resources and based on vtune I can see my program is spending a lot of time spinning but I have not found out where it is spinning yet.
I have parallel studio and would appreciate any advice on how to find out where the program is spinning so I can fix it. Overall it seems my parallelization is not very well balanced and I am trying to figure out where the problems are.
If this kind of thing is caused by one thread doing a lot of work and the others running out of work if I can find out where that is happening there are some tasks I can break up more finely at the cost of a small amount of memory.
I am using TBB + the TBB version of MKL for BLAS and LAPACK calls.
Thank you for the help
Could you please switch stack attribution to "User/System" from default "User function +1" and provide the new picture?
It can be done with "Call Stack Mode" knob on filer bar:
Thank you, Regards, Dmitry