Software Archive
Read-only legacy content
17060 Discussions

White paper: N-body simulation on Intel Xeon Phi Coprocessors

Andrey_Vladimirov
New Contributor III
653 Views

Hi All,

I am a rookie in the Intel Developer Zone, but I was fortunate to have early access to Intel Xeon Phi coprocessors for some time. Now that Intel has lifted the Xeon Phi coprocessor's veil, I am happy to share my user experience and demonstrate some examples.

I have written a white paper investigating a toy model, the N-body simulation, on an Intel Xeon Phi coprocessor. The PDF file can be downloaded here: http://research.colfaxinternational.com/file.axd?file=2013%2f1%2fColfax_Nbody_Xeon_Phi.pdf . The paper contains code samples, assembly listings and performance results, and discusses some programming practices for the efficient utilization of the coprocessor. Most importantly, it demonstrates that a single C/C++ code can efficiently run on Xeon processors and Xeon Phi coprocessors. Mind that the performance results are obtained on a pre-production sample, and the specifications and performance of the final product may be different.

If this paper has some utility to fellow developers and scientists, I will be thrilled. I am also looking forward to discussions on this forum and learning about the experience of other people with Xeon Phi programming.

Cheers,

Andrey.

0 Kudos
11 Replies
TaylorIoTKidd
New Contributor I
653 Views

Andrey,

Thank you for contributing to the community.

Regards
--
Taylor

0 Kudos
zhu_j_
Beginner
653 Views

Hi Andrey

May I get your source code run on Xeon processors and Xeon Phi coprocessors.

Regards,

zhu.j

0 Kudos
Bernard
Valued Contributor I
653 Views

@zhu.j

The part of the source code is in N-body simulation white paper.

@Andrey Vladimirov

Thank you for publishing your white paper about the N-body simulation.It helped me a lot to understand how to implement N-body simulation.

0 Kudos
zhu_j_
Beginner
653 Views

@iliyapolak 

I hope to have the whole code and try to run it on my own platform。

thanks

0 Kudos
Bernard
Valued Contributor I
653 Views

The part of the code which is missing are calls to particles initializers.

0 Kudos
zhu_j_
Beginner
653 Views

thanks

0 Kudos
zhu_j_
Beginner
653 Views

iliyapolak wrote:

The part of the code which is missing are calls to particles initializers.

 

thanks

0 Kudos
zhu_j_
Beginner
653 Views

iliyapolak wrote:

The part of the code which is missing are calls to particles initializers.

 

Thank you!

0 Kudos
Bernard
Valued Contributor I
653 Views

I managed to add Windows multithreading now I am trying to properly synchronize the threads.I run it on Haswell CPU.

0 Kudos
Andrey_Vladimirov
New Contributor III
653 Views

Hmm, I thought that we included a complete code listing in the paper. The particle initialization step uses a random number generator for MKL. It is not a very physically meaningful setup, just some numbers to crunch.

In any case, here is a code that can be run (sorry if spacing is off):

[cpp]

#include <math.h>
#include <mkl_vsl.h>
#include <omp.h>
#include <stdio.h>

struct ParticleSystemType {

  float *x, *y, *z, *vx, *vy, *vz;

};

int main(const int argc, const char** argv) {
  const int nParticles = 30000, nSteps = 10; // Simulation parameters
  const float dt = 0.01f; // Particle propagation time step
  ParticleSystemType p; // Particle system stored as a structure of arrays
  float *buf = (float*) malloc(6*nParticles*sizeof(float)); // Malloc all data
  p.x  = buf+0*nParticles; p.y  = buf+1*nParticles; p.z  = buf+2*nParticles;
  p.vx = buf+3*nParticles; p.vy = buf+4*nParticles; p.vz = buf+5*nParticles;

 

  // Initialize particles
  VSLStreamStatePtr rnStream;  vslNewStream( &rnStream, VSL_BRNG_MT19937, 1 );
  vsRngUniform(VSL_RNG_METHOD_UNIFORM_STD, rnStream, 6*nParticles, buf, -1.0f, 1.0f);


 
  // Propagate particles
  printf("Propagating particles using %d threads...\n", omp_get_max_threads());
  double rate = 0.0, dRate = 0.0; // Benchmarking data
  const int skipSteps = 1; // Skip first iteration is warm-up on Xeon Phi coprocessor
  for (int step = 1; step <= nSteps; step++) {
    const double tStart = omp_get_wtime(); // Start timing

 

    #pragma omp parallel for schedule(dynamic)
    for (int i = 0; i < nParticles; i++) { // Parallel loop over particles that experience force
      float Fx = 0.0f; float Fy = 0.0f; float Fz = 0.0f; // Components of force on particle i


      
      for (int j = 0; j < nParticles; j++) { // Vectorized loop over particles that exert force
    if (j != i) {
      // Newton's law of universal gravity calculation.
      const float dx = p.x - p.x;
      const float dy = p.y - p.y;
      const float dz = p.z - p.z;
      const float drSquared    = dx*dx + dy*dy + dz*dz;
      const float drPowerN32   = 1.0f/(drSquared*sqrtf(drSquared));
      
      // Reduction to calculate the net force
      Fx += dx * drPowerN32;  Fy += dy * drPowerN32;  Fz += dz * drPowerN32;
    }
      }


      // Move particles in response to the gravitational force
      p.vx += dt*Fx;        p.vy += dt*Fy;        p.vz += dt*Fz;
    }


    for (int i = 0; i < nParticles; i++) { // Not much work, serial loop
      p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt;
    }


    
    const double tEnd = omp_get_wtime(); // End timing
    if (step > skipSteps) // Collect statistics
      { rate  += 1.0/(tEnd - tStart); dRate += 1.0/((tEnd - tStart)*(tEnd-tStart)); }


    printf("Step %d: %.3f seconds\n", step, (tEnd-tStart));
  }


  rate/=(double)(nSteps-skipSteps); dRate=sqrt(dRate/(double)(nSteps-skipSteps)-rate*rate);
  printf("Average rate for iterations %d through %d: %.3f +- %.3f steps per second.\n",
     skipSteps+1, nSteps, rate, dRate);
  free(buf);
}

 

[/cpp]

 

The compilation commands on Linux are:

[bash]

icpc -xhost -O3 -openmp -mkl -fimf-domain-exclusion=8 -o nbody nbody.c

icpc -O3 -openmp -mkl -fimf-domain-exclusion=8 -o nbody-mic nbody.c -mmic

[/bash]

The flag "-fimf-domain-exclusion=8" disables denormal number handling in math functions.

0 Kudos
danielpf
Beginner
653 Views

Hi!

For information with version 14 of Intel compiler the -fimf-domain-exclusion=8 option is no longer improving the code speed. 

Thanks for the interesting comments on the code optimization. 

 

 

 

 

 

 

 

 

0 Kudos
Reply