Question about performance

AGG1 · ‎03-11-2014

I'm writing to see if someone could help me understand an issue in our solver that recently came up while using Vtune Amplifier. I'll try and describe this here:

Using vtune amplifier we see that the time spent in a function "mucal" goes up as number of threads increase. On 8 threads, mucal is at the top of the list.

mucal is a function that calculates viscosity. This is called in the following manner.

do ijk=1,iend

mu(ijk)=mucal(ijk,iopt)

end do

CFD mesh First cell index: 1

CFD mesh Last cell index: iend

OpenMP threads split ijk index.

Inside mucal function we use 2 modules and include 6 common blocks.

Modules have arrays of size (1:iend). These are mostly 1D arrays that store velocity, pressure etc. Common blocks has mostly scalar variables but a lot of them.

To fix this, we tried the following:

Instead of using array modules inside mucal, pass that ijk value to mucal function (eg. mu(ijk)=mucal(ijk,iopt,u(ijk)). This did not help.
Instead of including common blocks, again pass those variables to mucal function. This also did not help
Calculate and store mucal(ijk) in a separate new array and then re-use that array, thereby reducing number of calls to the function mucal. This helped and for 8 threads mucal was no longer at the top of the list.

My question is why does time spent in mucal increase with number of threads? Is it a combination of using common blocks and modules or something else? What's the best approach to prevent issues like this?

Thanks!