- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so I currently have a code in fortran whose premise is shown below. Recently I have tried to export my code to a MIC, and obtained the segfault warning. Upon some research, I learned this was most likely due to my vectors not being aligned properly. My goal here is to always have the cl, loc_r, and cv variables aligned on 64 byte boundaries for use on a xeon phi coprocessor. However, I don't know how to get the allocate command to do this. Assuming all the vectors are aligned, where is the appropriate place in dosomecalc() to place the !dir$ vector alligned statement? I have read both the fortran array data intel document, and the data alignment to assist vectorization document. However neither of them appear to answer the scenario I am in. I am a chemical engineer and not a computer scientist, and its quite possible that I just was not able to understand the documents. Any help or advice would greatly be appreciated.
module mymod
implicit none
type cell_data
double precision, allocatable :: loc_r(:,:)
end type cell_data
type vlist
integer :: cnum
double precision :: minx,miny,minz
end type vlist
type list
type(cell_data), allocatable :: cl(:)
type(vlist), allocatable :: cv(:,:)
end type list
type(list) :: listvar
contains
subroutine alloc()
if(conditon1.eq.true)then
deallocate(listvar%cl)
allocate(listvar%cl(0:new_size)deallocate(listvar%cv)
allocate(listvar%cv(0:new_size)
do i = 1,new_size
allocate(listvar%cl(i)%loc_r(3,someothersize)
enddo
elseif(condition2.eq.true)then
do i = 1,cur_size
deallocate(listvar%cl(i)%loc_r)
allocate(listvar%cl(i)%loc_r(3,somenewsize)
enddo
end if
end subroutine alloc
end module mymod
!--- this is another module
subroutine dosomecalc()
use mymod
implicit none
!--- loop over all cells
do i = 0,listvar%ncellT-1
do k=1,26
!---determine this cell
cell_neigh = listvar%cv(k,i)%cnum
if(cell_neigh.gt.i)then
!---minimum image
minx =listvar%cv(k,i)%min_x
miny =listvar%cv(k,i)%min_y
minz =listvar%cv(k,i)%min_z
!--- number of particles in cell i
cell1_num = listvar%cl(i)%num
!--- loop over particles in each cell
do j = 1, cell1_num
x1 = listvar%cl(i)%loc_r(1,j)
y1 = listvar%cl(i)%loc_r(2,j)
z1 = listvar%cl(i)%loc_r(3,j)
!--- number of particles in cell_neigh
cell2_num = listvar%cl(cell_neigh)%num
do l= 1, cell2_num
x2 = listvar%cl(cell_neigh)%loc_r(1,l)
y2 = listvar%cl(cell_neigh)%loc_r(2,l)
z2 = listvar%cl(cell_neigh)%loc_r(3,l)
!--- obtain displacements
!--- apply minimum image here
dx = x2-x1-minx
dy = y2-y1-miny
dz = z2-z1-minz
dr2 = dx*dx+dy*dy+dz*dz
if(dr2.lt.param%rcut2)then
dr2i = 1.0d0/dr2
dr6i = dr2i*dr2i*dr2i
dr12i = dr6i*dr6i
energy = energy + dr12i - dr6i
endif
enddo
enddo
endif
enddo
enddo
end subroutine dosomecalc
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These alignment issues aren't standard Fortran, so the Intel compiler user guide (in documents directory of compiler installation) is the reference you need.
!dir$ vector aligned
(as a prelude to vectorizable data assignment) doesn't bring about alignment; on MIC it would likely ensure segfault when encountering data which are misaligned. Without alignment assertions, the compiler must build in alignment adjustment, so there seems room to doubt your suspicion that alignment is your problem (although it may be important for performance).
The feature in your quoted code which looks problematical is your declaration of type vlist which requires padding after the integer element to allow the following double precision to be aligned, even to an 8 byte boundary. As the docs tell you, an attribute align directive inside the type declaration block would be the only way to correct this, if in fact this is your problem. If the compiler didn't build in padding to an 8-byte boundary the way you have shown it, it may be a bug, but you have contributed to make that happen.
Read about !dir$ attributes align ..... in the compiler user guide . The attributes directive is placed in the procedure which declares the array.
There is also the compile line option -align array64byte which would relieve you of the need to declare alignment of individual data objects and make it easy to optimize for other targets, e.g. array32byte for corei7.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks for that advice! Unfortunately, it still did not resolve my issue. I am still receiving the following error:
offload error: process on the device was terminated by signal 11
The loop in subroutine dosomecalc() is the one I am attempting to export using
!$dir$ offload begin target(mic:0)
!omp parallel do schedule(dynamic) reduction(+:energy) default(private) shared(param,listvar)
Also, when I tried !$dir$ offload begin target(mic:0) IN(param,listvar) I got an error saying variables with intent in cannot have allocatable arrays. However when I got rid of the IN statement, I no longer received that error. Sorry if I am now shifting from the topic that is the title of this post, let me know if I should just post a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I should also note that I am using -align -align records -align array64byte -align rec32byte
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm afraid I'm with Tim. I do not believe unaligned data is what caused you to segfault.
I saw your post on StackOverflow yesterday and wasn't quite sure how to respond. I notice that you also posted your problem in another forum issue here on the Intel site (https://software.intel.com/en-us/forums/topic/516307). If it is ok with you we can refer that issue to this one and just deal with the issue here.
Your Fortran code has a very C++ feel to it. (Of course, that might be because I am an old Fortran programmer from the time when Fortran was simpler.) Is C++ your preferred language? That may be part of the problem you are having. In the line:
allocate(listvar%cl(0:new_size)
you are allocating new_size+1 elements of unknown size. (Notice I said new_size+1 elements - in Fortran, array syntax is (first_element:last_element) rather than (first_element:count).) I believe you are thinking of cl as an array of pointers to objects of cell_type. But actually, it is an array of objects of cell_type which have unknown size. Perhaps this is valid Fortran - as I say, I am an old fossil when it comes to this new fangled Fortran - but I don't believe the compiler is going to do what you want it to here. Perhaps if you actually made cl an array of pointers to cell_data?
If you do decide to try making cl an array of pointers, you will want to be careful about your deallocations - before you deallocate the cl array, you will want to deallocate the cell_data being pointed to.
Of course, being the old fossil that I am, I would probably get rid of these structures and just use simple arrays. So much easier for the compiler to vectorize (and for us old fossils to read.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Its interesting you think C++ is my default. I barely have any experience coding in C++. The code runs fine in serial or openmp across cores. However when I switch to the MIC, that is when I segfault. I don't know if that helps you determine the error at all. I have never debugged on a MIC before, so I am at a loss.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible that listvar is not being copied to the MIC? I still don't understand why it would complain that I am using a data type which contain allocatable arrays when I use IN(listvar), but does not complain when I do not use this statement. Perhaps this is indicitave of something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Where exactly did you place the offload/omp directives?
We do not support offloading of a derived-type with an allocatable component in the current release so that’s probably related to the seg-fault. Currently the derived-type must be bitwise-copyable without any allocatable component, so all fields must be numeric. We are adding support for this in our next release (currently in Beta) due out later this year; however, the support is only for a numeric allocatable component meaning it will not support nested allocations so what you have is not supported, but I don’t know if you cannot use offload for your case at all. I haven’t followed where you placed the offload/omp directives (on the inner-most loop?). Can you clarify that and try posting the actual code showing that using the “{code}” insert tool to help with readability.
Also, the data exclusive to the coprocessor is not limited in terms of its complexity. If you are able to recreate the data corresponding to your structures on the coprocessor then you could use your structures on the coprocessor exclusively.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh man, that is potentially game ending news for me. Monte Carlo is such a pain to parallelize. Here is the code reposted!
!$dir$ offload begin target(mic:0)
!omp parallel do schedule(dynamic) reduction(+:energy) default(private) shared(param,listvar)
!--- loop over all cells
do i = 0,listvar%ncellT-1
do k=1,26
!---determine this cell
cell_neigh = listvar%cv(k,i)%cnum
if(cell_neigh.gt.i)then
!---minimum image
minx =listvar%cv(k,i)%min_x
miny =listvar%cv(k,i)%min_y
minz =listvar%cv(k,i)%min_z
!--- number of particles in cell i
cell1_num = listvar%cl(i)%num
!--- loop over particles in each cell
do j = 1, cell1_num
x1 = listvar%cl(i)%loc_r(1,j)
y1 = listvar%cl(i)%loc_r(2,j)
z1 = listvar%cl(i)%loc_r(3,j)
!--- number of particles in cell_neigh
cell2_num = listvar%cl(cell_neigh)%num
do l= 1, cell2_num
x2 = listvar%cl(cell_neigh)%loc_r(1,l)
y2 = listvar%cl(cell_neigh)%loc_r(2,l)
z2 = listvar%cl(cell_neigh)%loc_r(3,l)
!--- obtain displacements
!--- apply minimum image here
dx = x2-x1-minx
dy = y2-y1-miny
dz = z2-z1-minz
dr2 = dx*dx+dy*dy+dz*dz
if(dr2.lt.param%rcut2)then
dr2i = 1.0d0/dr2
dr6i = dr2i*dr2i*dr2i
dr12i = dr6i*dr6i
energy = energy + dr12i - dr6i
endif
enddo
enddo
endif
enddo
enddo!$omp end parallel do
!dir$ end offload
end subroutine dosomecalc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might find better vectorization opportunities by changing from one 3D XYZ storage format to three 1D formats.
Logically this:
!--- loop over all cells do i = 0,listvar%ncellT-1 do k=1,26 !---determine this cell cell_neigh = listvar%cv(k,i)%cnum if(cell_neigh.gt.i)then !---minimum image minx =listvar%cv(k,i)%min_x miny =listvar%cv(k,i)%min_y minz =listvar%cv(k,i)%min_z !--- number of particles in cell i cell1_num = listvar%cl(i)%num !--- loop over particles in each cell do j = 1, cell1_num x1 = listvar%cl(i)%loc_r_X(j) ! independent 1D array of Xs y1 = listvar%cl(i)%loc_r_Y(j) ! independent 1D array of Ys z1 = listvar%cl(i)%loc_r_Z(j) ! independent 1D array of Zs !--- number of particles in cell_neigh cell2_num = listvar%cl(cell_neigh)%num do l= 1, cell2_num x2 = listvar%cl(cell_neigh)%loc_r_X(l) y2 = listvar%cl(cell_neigh)%loc_r_Y(l) z2 = listvar%cl(cell_neigh)%loc_r_Z(l) !--- obtain displacements !--- apply minimum image here dx = x2-x1-minx dy = y2-y1-miny dz = z2-z1-minz dr2 = dx*dx+dy*dy+dz*dz if(dr2.lt.param%rcut2)then dr2i = 1.0d0/dr2 dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + dr12i - dr6i endif enddo enddo endif enddo enddo
However, the conditional if test may interfere with vectorization. Consider changing
if(dr2.lt.param%rcut2)then dr2i = 1.0d0/dr2 dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + dr12i - dr6i endif
to
dr2i = 1.0d0/dr2 dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i - dr6i if(dr2.lt.param%rcut2) energy = energy + dr12i
The optimization could then perform a masked move. Although you may be "needlessly" executing 6 operations, this can be done as vector, which may reduce your loop trip count by a factor of 8 or 16.
Before you make the edit, verify that the compiler wasn't smart enough to figure this out for you.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I took the allocatable arrays out of the data type and just have them declared as global allocatable array, can I pass them to the MIC then? Does the MIC only only static arrays to be passed to it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use global allocatable arrays. Arrays need not be only static. If you have not already, have a peek at the product samples under: C:\Program Files (x86)\Intel\Composer XE\Samples\en_US\Fortran\mic_samples.zip
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Kevin stated earlier derived types (user defined types) with allocated components (IOW not POD/Plane Old Data) cannot be passed with an offload.
If these data objects live preponderantly inside the MIC, then perform the allocations inside the MIC. You would then need to construct a means to offload out these compound objects. This typically involves packing the contents into a contiguous blob, then transferring that. Not unlike what you would do to checkpoint the data to a file. In this case the "file" is a temporary array used to import/export the user defined type. It should be relatively easy to write the member functions to do this (at least easy after you have done it once).
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page