Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28848 Discussions

Outout is affected for some weird silly reaons

bhanu1
Beginner
1,703 Views

Hi everyone, I am new to this community. I am using ifort compiler for my abaqus subroutine. I was using it effortlessly until the last 2 weeks I have been struggling a lot! It is just that my output is affected because of some silly reasons as follows

 

1. When I print any variables the problem gets fixed

2. When I use any variables that are passed in through my subroutine from abaqus to declare the dimension size, I only gets the correct result without printing anything, which is super silly. Here is an example. Here is a simple 2 elements and I am computing its stiffness matrix and so displacement, I will spare the loading etc but the point is u can clearly see each element has 4 nodes

bhanu1_1-1707424240253.png

So when I declare b matrix as 4 by 8 abaqus just gives an error. But when I use nnode and abaqus passes it I get correct result!! but nnode -4 absolutely!

 

bhanu1_0-1707424161148.png

Then I use a do loop for forming ba matrix! I don't understand why it is wrong! It is taking 10 days to fix this small problem so that I can apply in larger scale! I am using all the checks as can be seen 

bhanu1_2-1707424475930.png

I understand the printing problem has been discussed and the recommendation was to initialize variables as well as to check arrays are not used than they are bound to! In my case it looks like the access to the b matrix array is happening in a very mysterious way no matter logically the array is literally the same! I am attaching the code for your kind perusal. I hope this discussion will help many people to not to loose their time. Kindly please give any insights what you think of it!

 

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
1,617 Views

The array UserVar in module kvisual is written to in subroutine UEL. Are the multiple threads writing to different sections of UserVar, .OR. expecting/requiring exclusive use of array UserVar?

If requiring exclusive use of UserVar, then compile with /Qopenmp .AND. in the module kvisual add

C$omp threadprivate(UserVar)

 

Note, inserting a PRINT/WRITE includes a critical section (internal to the statement). This will cause a slight delay for other threads than the one that currently holds the critical section. This delay for other threads may permit the holding thread upon exit of the critical section within the print/write to have sufficient time to execute the code following the print/write without adverse interference from the other threads. 

 

*** IF this be the case, then the fact that you observe expected results with two threads when testing does not mean that two threads are executing without the potential for conflict.

 

Additional note.

This is purely speculative.

While you are compiling with /recursive and /Qauto which should force local arrays on stack, try adding /Qopenmp.

If this alone corrects the problem (and provided that the local arrays are indeed on stack), then this may be indicative that the code may be performing memory allocations (e.g. for temporaries) and that /Qompenmp is linking with the thread-safe library as opposed to without.

If /QopenMP does not fix the problem, then you can do a little detective work by leaving in the /Qopenmp and start with inserting

C$omp critical and C$omp end critical around the complete body of the code and run with sufficient number of threads to have caused the problem.

If this runs ok, then it is likely the problem is within your code and then you can narrow and move the scope of the critical section to zero in on the affecting region of code.

 

Jim Dempsey

View solution in original post

0 Kudos
8 Replies
Ron_Green
Moderator
1,679 Views

I don't see where nnode is getting set.  Is that passed in from Abaqus?  This does seem like an issue you should send to Abaqus Support for help.

 

grep nnode *.for

     &               props, nprops, coords, mcrd, nnode,

     &    nsvars, nprops, mcrd, nnode, jtype, kstep, kinc, jelem,

     &    coords(mcrd,nnode), u(ndofel), du(mlvarx,*), v(ndofel),

     &    ddlmag(mdload,*), predef(2,npredf,nnode), dtime, period

       real(rkind) :: dNdx(ndim,4),b(4,2*nnode),ddsdde(4,4),xjac0inv(2,2)

! write(6,*)'nnode=',nnode    

       call kshapefcn(kintk,ninpt,nnode,ndim,dN,dNdz)      

       call kjacobian(jelem,ndim,nnode,coords,dNdz,djac,dNdx,mcrd)

       call kshapefcn(kintk,ninpt,nnode,ndim,dN,dNdz)      

       call kjacobian(jelem,ndim,nnode,coords,dNdz,djac,dNdx,mcrd)

        ! dstran=matmul(b,du(1:ndim*nnode,1))

subroutine kshapefcn(kintk,ninpt,nnode,ndim,dN,dNdz)

      integer :: kintk, ninpt, nnode, ndim

  ! PARAMETER (ndim=2 ,ninpt=4,nnode=4)  

      real(rkind) ::  dN(nnode,1),dNdz(ndim,4),coord24(2,4),g, h

      subroutine kjacobian(jelem,ndim,nnode,coords,dNdz,djac,dNdx,mcrd)

      integer :: ninpt, nnode, ndim, mcrd, inod, idim, jelem, jdim

  ! PARAMETER (ndim=2 ,ninpt=4,nnode=4)  

      real(rkind) ::  xjac(ndim,ndim),xjaci(ndim,ndim),coords(mcrd,nnode)

      real(rkind) :: dNdz(ndim,nnode),dNdx(ndim,nnode), djac

      do inod=1,nnode

0 Kudos
andrew_4619
Honored Contributor III
1,646 Views

It rather sounds like you have a memory corruption problem. When you change anything such as adding a print the code changes so the thing getting clobbered changes so the outcome is a bit random.

I would check very carefully the number, order,  type and kind of everything being passed between subroutines and set as many checks as you can in the compiler options.

 

I would also suggest making a dummy main program that sets some data  and calls your UMAT do you can test/debug away from ABAQUS.

 

 

 

0 Kudos
bhanu1
Beginner
1,633 Views

Hello @andrew_4619  and @Ron_Green  thank you very vey much for such a quick reply! yes nnode is passed from abaqus, if The issue actually I found due to parallel computing ! it looks like when I use 2 cores its completely fine but using more than 2 corrupts my data, idk especially b matrix as they simultaneously try to write! Ro be honest I could not understand more. here is a discussion that I found very useful 'https://community.intel.com/t5/Intel-Fortran-Compiler/Issue-with-multiple-cores/m-p/1016978#M107253'. But can you please elaborate more, by looking at my subroutine how can I lock my data. Sorry I am not a super expert on this. Please let me know. 

Some more clarifications, so uel subroutines passes these variables and we don't change them at all, they are super scared.

 

bhanu1_1-1707484026092.png

from here we change and defined our variables. Some of them are super sensitive such as b when it has nnode in it it gives correct results when an number it does not as I was explaining. 

 

Let me kindly know if I explained it correctly or not! kindly please give your insights on it!

 

Regards

Bhanu 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,622 Views

>>The issue actually I found due to parallel computing ! it looks like when I use 2 cores its completely fine but using more than 2 corrupts my data

 

Where is the threading performed (your code has no parallel regions)?

What version of Fortran are you using?

 

Jim Dempsey

0 Kudos
bhanu1
Beginner
1,617 Views

Hi @jimdempseyatthecove , thank you very much for the reply! i am not sure I understand the 1st question properly! I will describe the problem fully! Actually many things are very same with the question that have been asked before and you gave the answer! But I am sorry my inexperience on these extreme technicality might be hindrance if I can grasp everything on that answer. So describe fully my issue: I have an user element subroutine, I attached the file before. In that routine 1st part in the universal variable declarations that communicate directly with abaqus and we are not supposed to change 

bhanu1_0-1707489453459.png

after that we defined our own variables that will be used to find the stiffness matrix(amatrx in global variables) which will be communicated to abaqus 

bhanu1_1-1707489562606.png

then we have other subroutines inside uel also such shape functions etc., the main goal is to find amatrx and rhs two variables. So what is surprising that I found when b matrix is defined like b(4,8) instead of b (4,2*nnode) abaqus gives wrong results!! nnode is a variable passed from abaqus global variable! nnode is equal to absolutely 4 so b matrix is essentially THE SAME. Another two ways b(4,8) gves correct results is simply if I print anything or if I run only with two processors! I have just two elements and I was trying out on these small number of elements before applying on larger elements! 

 

I am using parallel  computing as shown below

"abaqus job=2_elems_0.1_bbar user=stran_rkind_modernized_2_condensed double cpus=12 interactive scratch=E"

This is the command line! 12 cpus here. Here is also all the checks I am using!

bhanu1_2-1707489981643.png

 

To answer the second question if I am not doing any wrong I am using 'Intel(R) 64, Version 2021.6.0 Build 20220226_000000' as my ortran compiler!

So I can not understand why exactly it it happening! the exact reason why particularly b matrix is getting affected not others? I iterate b matrix for each integration point. Why printing or less core making it correct?

 

Please kindly feel free if I am unable to explain anything! I will really appreciate if you can help me to solve it! I have been struggling for 2 weeks!

 

Many many thanks!

 

Regards

Bhanu

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,618 Views

The array UserVar in module kvisual is written to in subroutine UEL. Are the multiple threads writing to different sections of UserVar, .OR. expecting/requiring exclusive use of array UserVar?

If requiring exclusive use of UserVar, then compile with /Qopenmp .AND. in the module kvisual add

C$omp threadprivate(UserVar)

 

Note, inserting a PRINT/WRITE includes a critical section (internal to the statement). This will cause a slight delay for other threads than the one that currently holds the critical section. This delay for other threads may permit the holding thread upon exit of the critical section within the print/write to have sufficient time to execute the code following the print/write without adverse interference from the other threads. 

 

*** IF this be the case, then the fact that you observe expected results with two threads when testing does not mean that two threads are executing without the potential for conflict.

 

Additional note.

This is purely speculative.

While you are compiling with /recursive and /Qauto which should force local arrays on stack, try adding /Qopenmp.

If this alone corrects the problem (and provided that the local arrays are indeed on stack), then this may be indicative that the code may be performing memory allocations (e.g. for temporaries) and that /Qompenmp is linking with the thread-safe library as opposed to without.

If /QopenMP does not fix the problem, then you can do a little detective work by leaving in the /Qopenmp and start with inserting

C$omp critical and C$omp end critical around the complete body of the code and run with sufficient number of threads to have caused the problem.

If this runs ok, then it is likely the problem is within your code and then you can narrow and move the scope of the critical section to zero in on the affecting region of code.

 

Jim Dempsey

0 Kudos
bhanu1
Beginner
1,569 Views

@jimdempseyatthecove  thank you soo much, I was struggling for moire than 1 week. Just adding/qopenmp solved it!! Can you please specify why this fixed my issue kindly please?

 

Another query I had was about uservsr that you earlier mentioned! I think you are right as I found my  output is getting weird! So uservar we used to store the output andvsend it to abaqus umat for display! Can you please tell me specifically why you thought this can be an issue because there ste many variables already there and it seems they got fixed if I just added openlp but I think exept uservae.

 

I am sorry my understanding is very rudimentary but I would love to enrich it more by gaining perspective from experts like you 

 

Thank you very much really for the help.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,492 Views

I thought I did "speculatively" explain it:

>>While you are compiling with /recursive and /Qauto which should force local arrays on stack, try adding /Qopenmp. If this alone corrects the problem (and provided that the local arrays are indeed on stack), then this may be indicative that the code may be performing memory allocations (e.g. for temporaries) and that /Qompenmp is linking with the thread-safe library as opposed to without.

So either something was not placed on stack with /recursive and/or /Qauto, but was with /Qopenmp.

Or there is some other compiler code generation difference between with and without /Qopenmp

If you are inclined to really find out what the problem is then for each configuration produce an assembly output:

jimdempseyatthecove_0-1707663912169.png

(rename each) and then look for differences.

 

From your description, parallelization is performed within Abaqus, which then calls your Fortran procedure in parallel. Therefore, all (your code) variable data is required to be private to the calling thread. IOW on thread's local stack or locally allocated, or threadprivate. Therefore UserVar must be threadprivate .AND. /Qopenmp must be enabled for !$omp ... to take effect.

 

Jim Dempsey

0 Kudos
Reply