- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to make some benchmark on my server (Xeon Phi) using the last "Intel Parallel Studio XE Cluster edition 2017 u1" and I found some strange behavior with the natural command "mpicxx" to compile MPI program in C++.
I have made a simple code, not even parallel (MPI.cpp - see below) and I am compiling with:
mpicxx -O3 MPI.cpp
I have executed the compiled code on 1 processor just to test the 1 core version ("time ./a.out"). With this compilation, my program take forever to compute the program.
But when I'm using this command (the same but through icpc):
icpc -I/opt/intel/impi/2017.1.132/include64/ -L/opt/intel/impi/2017.1.132/lib64/ -O3 MPI.cpp -lmpi -lmpicxx
The code is quicker than with mpicxx... Where is the issue?
I'm working on Red Hat with Intel PSXE CE 2017 u1. (2 Intel E5-2667 + 128GB + 8 Xeon Phi 31S1P).
Thank you.
MPI.cpp:
#include <iostream> #include "mpi.h" #include <cmath> using namespace std; int main() { MPI::Init(); int rank = MPI::COMM_WORLD.Get_rank(); int size = MPI::COMM_WORLD.Get_size(); if (rank == 0) cout << size << endl; long n = 100000000000/size; double sum = 1.0; for(long i = 1; i<n; ++i) sum *= pow(2.0*(double)(i+rank*n), 2) / (pow(2.0*(double)(i+rank*n), 2) - 1.0); double sumT = 1.0; MPI::COMM_WORLD.Allreduce(&sum, &sumT, 1, MPI::DOUBLE, MPI::PROD); if (rank == 0) cout << sumT << endl; MPI::Finalize(); }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are comparing performance of icpc vs. g++ or clang++, the latter don't support math function auto-victimization. Besides, they would require specific setting to invoke simd sum reduction.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are comparing performance of icpc vs. g++ or clang++, the latter don't support math function auto-victimization. Besides, they would require specific setting to invoke simd sum reduction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note that Google spell corrector doesn't accept vectorization either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess you run on your host cpu as sse2 won't run on knc and would be slow on knl.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok.. I was confused with the name. I need to use mpiicpc for compiling (and not mpicxx). I didn't know that intel parallel studio contains the GNU compilers ! Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've done a set of tests with modified codes and I didn't have any problems.
//////////////////////////////////////////////////////////////////////////////// // test12.c - MPI test for Intel Xeon Phi Processor x200. // Notes: // - https://software.intel.com/en-us/forums/intel-c-compiler/topic/710253 // Cmdlines: // mpiicpc -O3 -xMIC-AVX512 -qopt-report=1 test12.c -o test12.out // mpiicpc -O3 -xMIC-AVX512 -qopt-report=1 -I/opt/intel/impi/5.1.3.210/include64/ -L/opt/intel/impi/5.1.3.210/lib64/ test12.c -lmpi -lmpicxx -o test12.out //////////////////////////////////////////////////////////////////////////////// #include <iostream> #include <cmath> #include "mpi.h" using namespace std; int main( void ) { MPI::Init(); int rank = MPI::COMM_WORLD.Get_rank(); int size = MPI::COMM_WORLD.Get_size(); printf( "Rank: %d\n", rank ); printf( "Size: %d\n", size ); size_t n = 100000000000 / size; // Test 5 - OK ( it takes some time to complete processing ) // size_t n = 10000000000 / size; // Test 4 - OK // size_t n = 1000000000 / size; // Test 3 - OK // size_t n = 100000000 / size; // Test 2 - OK // size_t n = 10000000 / size; // Test 1 - OK printf( "Number of Iterations: %ld\n", n ); double sum = 1.0L; for( size_t i = 1; i < n; i += 1 ) { sum *= pow( 2.0 * ( double )( i+rank*n ), 2 ) / ( pow( 2.0 * ( double )( i+rank*n ), 2 ) - 1.0 ); } double sumT = 1.0L; MPI::COMM_WORLD.Allreduce( &sum, &sumT, 1, MPI::DOUBLE, MPI::PROD ); if( rank == 0 ) cout << sumT << endl; MPI::Finalize(); return ( int )1; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page