- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all:
In my simulation, i have multiple simulation boxes. Molecules can randomly interchange between boxes based on potential energy calculation, so i need to append new molecule and delete moved molecule from/to data structure. During potential energy calculation, a particular molecules can only interact with others molecules of same box not molecules of other boxes. Natural choice can be linked list of such kind of cases. But as far i know, linked list are not allocated side by side in memory that might worsen performance in xeon phi and no 64 byte alignment( please correct me, if i am wrong). I have another plan, like, i will keep all co-ordinates of molecules of all boxes in a single array, and at the same time, i will have integer array that is same size of molecules number which will keep track box identity of molecules and it will be updated if molecule moves, so in that case i do not need to append and delete from data structure. But i am worried for this idea, as molecules of same box are not together rather they are scattered at different location of array of coordinates, i need to search them and calculate potential energy. Can any body give me ideas how should i approach for such kind of problem. I am using fortran.
Regards
Masrul
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Masrul,
It looks like you have a question which is not specific to Intel xeon phi coprocessors. We will be able to better assist you if you have a version of code which is already working but not performing well on coprocessors. As far as which data structure and algorithm to use is concerned, this forum might not be the best place to direct your question.
However, If I am not wrong, the problem you are trying to solve shares similarities with the stencil algorithms which are already implemented for Intel xeon phi coprocessors. You might want to refer to the book: High Performance Parallelism Pearls by James Reinders and Jim Jeffers for details on how stencils problems can be targeted for Intel Xeon Phi Coprocessors. You can also refer to the following article for applying optimization for such stencil problems.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sunny,
I am sorry to say, but Masrul's question is specific to Xeon Phi. In particular he has identified the a linked list structure is not viable and he is searching for a better solution.
The Xeon Phi major capability is with its wide small vector.
Masrul,
One thing you might be able to do (dependent on what has not been said) is as follows:
If the differential of number of molecules in adjacent boxes is relatively small (compared to the box). IOW, in a gas, when the boxes are sufficiently small such that the pressure is approximately equal in adjacent boxes (but may vary with gravity as you traverse up or down a column of boxes), then what you might consider doing is making each box a container that can hold a worst case number of molecules. The box essentially is an array allocated to the potential max, but is only partially filled.
When a particle migrates into a box, it is appended into an existing (but available) cell, and the count of molecules in the box is increased.
Migration out of the box is different. When the molecule is not the last (highest indexed) one in the box, the molecule data is exported -> appended as stated for incoming molecules, however, the current last molecule of the box with the exiting molecule, is "moved" into the position where the outgoing molecule was located. And the count of molecules in the box is decremented.
The above technique assures the molecule data within a box always remains cache line aligned, and has no gaps. All molecules within a box can be operated on with vector instructions (last one possibly masked).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Thanks for your reply. I will go ahead and update my comments.
-Sunny
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim. You correctly got what i tried to say. Your solution is really very good. Sorry i did not mention clearly, but yet some complexities remain, like if system contains different types of molecule(i.e 'A' molecule type contains 5 atoms and 'B' type molecule contains 6 atoms), if 'A' type leaves a box and if the last molecule of that box is B type, then i can not shift last one to that place.
-Masrul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Masrul,
What this sounds like is you are using C++ OOP style to conceptualize an abstract molecule of type A or B or ..., each different "size" that reside in a bag (object) and that you want to move the bag from cell to cell.
For efficient computation using vectors, consider dividing what you conceive as your abstract molecule bag into two parts. One part is the molecule to molecule interaction part, which should be the same for each molecule, and the second part is the variable part between type A, B, .... The fixed part has a reference/pointer to the variable part. In this manner you can move just the fixed part of the particles as described in #3.
Depending on your computations, it might be computationally beneficial to reorganize the fixed part from AOS (array of structures) to SOA (structure of arrays). At first thought, you may say that this complicates particle movement (it does). However, there is a trade-off between the cost of the move verses the cost of the interaction computation. If the number of molecules moving from cell to cell is relatively small compared to the number of molecules not moving, then the higher (computational) cost of movement is paid back by lower (computational) cost of computing the interactions. You may have to experiment to find the better solution. Please note that today we have 512-bit small vectors (and cache lines) for Xeon Phi... and double this is on the roadmap. Any changes you make that favor vectorization will improve over time.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page