- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I would like to run an asynchronous calculation, but am having a hard time understanding with the intel user and reference guide are saying regarding this. I have code that looks like the following.
signal_value = 1 !dir$ offload target(mic:0) signal(signal_value),& !dir$ in(position: alloc_if(.false.),free_if(.false.)),& !dir$ inout(ff: alloc_if(.false.), free_if(.false.)),& !dir$ nocopy(nlist: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(numneigh: alloc_if(.false.) free_if(.false.)),& !dir$ in(q: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj1: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj2: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj3: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj4: alloc_if(.false.) free_if(.false.)) !---asynchronous computation subroutine on MIC call lj_cut_coul_dsf_nonewton(step) !---asynchronous computation on CPU host (next 4 subroutine calls) call bond_harmonic(step) call angle_harmonic(step) call dihedral_opls(step) call improper_harmonic(step) !dir$ offload_wait target(mic:0) WAIT (signal_value)
Now in this code, lj_cut_dsf_nonewton consists of a block of code inside openmp directives that I would like to run asynchronously on the xeon phi coprocessor. No code in this subroutine is not ran on the xeon phi coprocessor, and all the offload directives for the necessary arrays occurred above in the offload statement.
!dir$ attributes offload:mic :: lj_cut_coul_dsf_nonewton subroutine lj_cut_coul_dsf_nonewton(step) !$omp parallel do default(firstprivate),& !$omp& shared(position,ff,nlist,numneigh,q,lj1,lj2,lj3,lj4) calculate non bonded forces for molecular dynamics on MIC !$omp end parallel do end subroutine
As shown in the comment, I want bond_harmonic, angle_harmonic, dihedral_opls, and improper_harmonic all to be ran on the host CPU asynchronously. However, when I compile the code, I get errors saying that global variables inside bond_harmonic, angle_harmonic, dihedral_opls, and improper_harmonic need to be declared with an offload target attribute.This makes me think that I am not understanding what the code is doing properly. I should not have to declare, and most importantly allocate memory, for these arrays/variables since they are never going to be on the coprocessor, and are supposed to be only being used asynchronously on the CPU. Could someone tell me if my understanding is correct, or where I am going wrong before I go about changing my code?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, I have been staring at this for a while and apologize if this is something dumb, but I am getting the error
"a global variable within a procedue with the offload:target attribute must have the offload:target attribute [type]"
"a global variable within a procedue with the offload:target attribute must have the offload:target attribute [size]"
Now, I will post the code lj_cut_coul_dsf_nonewton, which is the only procedure with an offload attribute, where I have no variables called type or size. What could be generating this error?
!dir$ attributes offload:mic :: lj_cut_coul_dsf_nonewton subroutine lj_cut_coul_dsf_nonewton(step) implicit none real*4 :: force,forcelj,forcecoul real*4 :: x1,y1,z1,x2,y2,z2 real*4 :: dx,dy,dz,dr,dr2,dr2i,dr6i,dr12i,dri double precision :: ffx,ffy,ffz real*4 :: qtmp,r,prefactor,erfcc,erfcd,t real*4 :: boxdx,boxdy,boxdz integer :: i,j,l,step integer :: itype,jtype,neigh integer :: tid,num integer :: offset,ioffset,neigh_off integer :: T1,T2,clock_rate,clock_max !$omp parallel do schedule(dynamic) reduction(+:potential,e_coul,ffx,ffy,ffz) default(firstprivate),& !$omp& shared(position,ff,nlist,numneigh,q,lj1,lj2,lj3,lj4) do i = 1 ,np x1 = position(i)%x; y1 = position(i)%y; z1 = position(i)%z; itype = position(i)%type qtmp = q(i) ioffset = (itype-1)*numAtomType neigh_off = neigh_alloc*(i-1) num = numneigh(i) ffx = 0.0d0; ffy = 0.0d0; ffz = 0.0d0 !dir$ vector aligned !dir$ simd reduction(+:potential,e_coul,ffx,ffy,ffz) do j= 1,num neigh = nlist(neigh_off+j) dx = x1-position(neigh)%x dy = y1-position(neigh)%y dz = z1-position(neigh)%z jtype = position(neigh)%type boxdx = dx*ibox; boxdy = dy*ibox; boxdz = dz*ibox boxdx = (boxdx+sign(1/(epsilon(boxdx)),boxdx)) -sign(1/epsilon(boxdx),dx) boxdy = (boxdy+sign(1/(epsilon(boxdy)),boxdy)) -sign(1/epsilon(boxdy),dy) boxdz = (boxdz+sign(1/(epsilon(boxdz)),boxdz)) -sign(1/epsilon(boxdz),dz) dx = dx-box*boxdx; dy = dy-box*boxdy; dz = dz-box*boxdz dr2 = dx*dx + dy*dy + dz*dz !---lennard jones interactions dr2i = 1.0d0/dr2 dr6i = dr2i*dr2i*dr2i if(dr2.gt.rcut2)dr6i=0.0d0 offset = ioffset + jtype forcelj = dr6i*(lj1(offset)*dr6i-lj2(offset)) potential = potential + dr6i*(dr6i*lj3(offset)-lj4(offset)) !---electrostatic calculations r = sqrt(dr2) dri = 1.0d0/r prefactor = qtmp*q(neigh)*dri if(dr2.gt.cut_coulsq)prefactor =0.0d0 erfcd = exp(-alpha*alpha*r*r) t = 1.0 / (1.0 + EWALD_P*alpha*r) erfcc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * erfcd forcecoul = prefactor * (erfcc*dri + 2.0*alpha*MY_PIS_INV*erfcd +& r*f_shift) * r e_coul = e_coul + prefactor*(erfcc-r*e_shift-dr2*f_shift) force = (forcecoul+forcelj)*dr2i ffx = ffx + dx*force ffy = ffy + dy*force ffz = ffz + dz*force enddo ff(i)%x = ffx; ff(i)%y = ffy; ff(i)%z = ffz enddo !$omp end parallel do end subroutine lj_cut_coul_dsf_nonewton
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conner,
The signal argument is not a value. It is an arbitrary variable who's address is used to disambiguate sessions. If for example, the signal variable is a stack local variable in a subroutine that is called from multiple threads within a parallel region, then each thread, using the same named variable (possibly with the same value), has a different address for the variable. However, if you mistakenly place the signal variable in a module or make it save, then you will have issues if you have concurrent asynchronous offloads attempting to use the same variable (address). Signal variables can be used within a module, as illustrated in the user guide, but then must be exclusively used (non-concurrently) for a single purpose (per MIC).
Now for your issue at hand.
Your above listed subroutine is containing array references (at least to position(:) and nlist(:)) that are neither defined nor used. Yet you have implicit none and no error report.
If this is a copy and paste issue for the posting above and if these arrays exist in a module, then you have to first determine if these arrays are to be accessible by both Host and MIC, or used only within the MIC. When used by both, then you will need an offload transfer to synchronize the data.
The variables and arrays declared in a module can be attributed to indicate they reside in the mic and/or alternate within host. While the names of these variables/arrays can exist in both places (Host/MIC) the physical storage locations differ. You are required to transfer data between the two areas when applicable. Additionally for allocatable arrays, attributed in both places, then must be allocated in both places. Please see some of the example programs.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi I would like to know about LMASK to suggest which method will be better for vectorization.
Thanks!
Edit:Sorry for the irrelevance of the topic.I was on another page of the Zone.
Sorry!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help, Jim! However, I don't think I quite followed what you were saying about signal_tag. I remember when I was trying to do asynchronous offloads, I had to initialize the value of the signal tag. I asked this quention in the following thread https://software.intel.com/en-us/comment/1795195#comment-1795195
If you scroll to the bottom, kevin mentions "The signal tag must be initialized to a non-zero unique (from other signal variable's value where more than one signal is used) value. Add a non-zero initialization of signal1 before (line 51) the use in line 55." Is this not the case for the asynchronous calculation? Would the following, where signal_tag is neither declared or initialized be correct?
subroutine force_wrapper(step,neighbor_flag)
implicit none integer :: step integer :: neighbor_flag !dir$ offload target(mic:0) signal(signal_value),& !dir$ in(position: alloc_if(.false.),free_if(.false.)),& !dir$ inout(ff: alloc_if(.false.), free_if(.false.)),& !dir$ nocopy(nlist: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(numneigh: alloc_if(.false.) free_if(.false.)),& !dir$ in(q: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj1: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj2: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj3: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj4: alloc_if(.false.) free_if(.false.)) call lj_cut_coul_dsf_nonewton(step) !---asynchronous computation call bond_harmonic(step) call angle_harmonic(step) call dihedral_opls(step) call improper_harmonic(step) !dir$ offload_wait target(mic:0) WAIT (signal_value)
Also, regarding the position and nlist error, I apologize. I should have been more descriptive. Those variables are indeed allocated and declared in serrate modules in global arrays. In fact, I previously have been running successfully lj_cut_coul_dsf_nonewton as follows where the subroutine itself initiates the offload
subroutine lj_cut_coul_dsf_nonewton(step) implicit none real*4 :: force,forcelj,forcecoul real*4 :: x1,y1,z1,x2,y2,z2 real*4 :: dx,dy,dz,dr,dr2,dr2i,dr6i,dr12i,dri double precision :: ffx,ffy,ffz real*4 :: qtmp,r,prefactor,erfcc,erfcd,t real*4 :: boxdx,boxdy,boxdz integer :: i,j,l,step integer :: itype,jtype,neigh integer :: tid,num integer :: offset,ioffset,neigh_off integer :: T1,T2,clock_rate,clock_max call system_clock(T1,clock_rate,clock_max) !dir$ offload begin target(mic:0) in(position: alloc_if(.false.),free_if(.false.)),& !dir$ inout(ff: alloc_if(.false.), free_if(.false.)),& !dir$ nocopy(nlist: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(numneigh: alloc_if(.false.) free_if(.false.)),& !dir$ in(q: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj1: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj2: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj3: alloc_if(.false.) free_if(.false.)),& !dir$ nocopy(lj4: alloc_if(.false.) free_if(.false.)) !$omp parallel do schedule(dynamic) reduction(+:potential,e_coul,ffx,ffy,ffz) default(firstprivate),& !$omp& shared(position,ff,nlist,numneigh,q,lj1,lj2,lj3,lj4) do i = 1 ,np x1 = position(i)%x; y1 = position(i)%y; z1 = position(i)%z; itype = position(i)%type qtmp = q(i) ioffset = (itype-1)*numAtomType neigh_off = neigh_alloc*(i-1) num = numneigh(i) ffx = 0.0d0; ffy = 0.0d0; ffz = 0.0d0 !dir$ vector aligned !dir$ simd reduction(+:potential,e_coul,ffx,ffy,ffz) do j= 1,num neigh = nlist(neigh_off+j) dx = x1-position(neigh)%x dy = y1-position(neigh)%y dz = z1-position(neigh)%z jtype = position(neigh)%type boxdx = dx*ibox; boxdy = dy*ibox; boxdz = dz*ibox boxdx = (boxdx+sign(1/(epsilon(boxdx)),boxdx)) -sign(1/epsilon(boxdx),dx) boxdy = (boxdy+sign(1/(epsilon(boxdy)),boxdy)) -sign(1/epsilon(boxdy),dy) boxdz = (boxdz+sign(1/(epsilon(boxdz)),boxdz)) -sign(1/epsilon(boxdz),dz) dx = dx-box*boxdx; dy = dy-box*boxdy; dz = dz-box*boxdz dr2 = dx*dx + dy*dy + dz*dz !---lennard jones interactions dr2i = 1.0d0/dr2 dr6i = dr2i*dr2i*dr2i if(dr2.gt.rcut2)dr6i=0.0d0 offset = ioffset + jtype forcelj = dr6i*(lj1(offset)*dr6i-lj2(offset)) potential = potential + dr6i*(dr6i*lj3(offset)-lj4(offset)) !---electrostatic calculations r = sqrt(dr2) dri = 1.0d0/r prefactor = qtmp*q(neigh)*dri if(dr2.gt.cut_coulsq)prefactor =0.0d0 erfcd = exp(-alpha*alpha*r*r) t = 1.0 / (1.0 + EWALD_P*alpha*r) erfcc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * erfcd forcecoul = prefactor * (erfcc*dri + 2.0*alpha*MY_PIS_INV*erfcd +& r*f_shift) * r e_coul = e_coul + prefactor*(erfcc-r*e_shift-dr2*f_shift) force = (forcecoul+forcelj)*dr2i ffx = ffx + dx*force ffy = ffy + dy*force ffz = ffz + dz*force enddo ff(i)%x = ffx; ff(i)%y = ffy; ff(i)%z = ffz enddo !$omp end parallel do !dir$ end offload call system_clock(T2,clock_rate,clock_max) ! print*,'elapsed time in force',real(T2-T1)/real(clock_rate) time_nonbond = time_nonbond + real(T2-T1)/real(clock_rate) potential = 0.50d0*potential e_coul = 0.50d0*e_coul end subroutine lj_cut_coul_dsf_nonewton
As you can see, the only change I made in the code was to declare lj_cut_dsf_nonewton as
!dir$ attributes offload:mic :: lj_cut_coul_dsf_nonewton subroutine lj_cut_coul_dsf_nonewton(step)
I then removed the offload directive in this subroutine, and just call it during the asynchronous computation. However, now I am gathering these errors. Could this possibly be a compiler specific error? I am currently using intel/13.1.1.163
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The confusion with the signal variable is a Fortran versus C/C++ issue. In both cases the value passed to signal( ) must be unique. I added a more complete explanation to that forum issue you cited but the short answer is that, for Fortran, signal( ) expects to be passed an integer value. What I recommend is that the value you use be LOC(array_name), where array_name is the name of some array being used in the offload directive. This means you don't need to remember that for some particular offload, you use a signal variable set to 6 or 10 or whatever; you only need to remember that you were moving an array named array_name.
Now, as to the message about type and size needing to have the offload attribute - I don't see where size is but I did find type and I think it comes down to the variable 'position' not being declared. You have said that position is declared inside a module, but I don't see the 'use' statement anywhere. If the compiler does not see any declaration for the variable named position, it will do the best it can, trying to figure out what you mean. And in this case, the best it can do is interpret position(i) as a function call (of unknown type but a function call none the less). So then, in position(i)%type, what the heck is type? Well, that must be a variable name. In your example showing an earlier version of lj_cut_coul_dsf_nonewton, you have given the compiler a clue that 'position' is a variable name because it is used as such in the offload statement. I'm kind of surprised that the code ran correctly but not that it compiled. If you add the use statement for your module into your current lj_cut_coul_dsf_nonewton, then the error message about type should disappear.
As to the global variables inside bond_harmonic, angle_harmonic, dihedral_opls, and improper_harmonic - I think Jim is right on target about this. Are those variables part of a module? Does that module have the offload attribute? It's an all or nothing thing when it comes to the module. Either the whole thing has the offload attribute or it doesn't.
As to the offload statement in force_wrapper. You are declaring all the variables as alloc_if(.false.). Did you allocated the space on the coprocessor somewhere before this? If not, you could be in trouble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Frances,
>> It's an all or nothing thing when it comes to the module. Either the whole thing has the offload attribute or it doesn't.
All the examples I've seen illustrates attributing individual elements within a module (e.g. an array or contains routine). Are you indicating that one can also attribute the "module foo" itself? (too?)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You know, Jim, I can sometimes say the silliest things in a very public way. Yes, you are right, it is the elements inside the module and not the module itself that have the offload attribute.
So, Conor, have you managed to figure out why you are getting errors saying that global variables inside bond_harmonic, angle_harmonic, dihedral_opls, and improper_harmonic need to be declared with an offload target attribute or are you still seeing that?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page