Feature Request: collector should recognize Cray MPI with ALPS_APP_PE env var

Ronald_G_2 · ‎09-29-2016

Kevin,

Cray MPI on their XC systems sets env var ALPS_APP_PE to the rank, unique for each rank, from 0 to N-1 for N ranks. They do not use the same env vars as MPICH, Intel MPI or OpenMPI to pass rank information down to applications.

advixe-cl run under MPI needs to open a results dir for each rank. I believe it is looking at MPICH, iMPI and OpenMPI env vars to find the rank number to use for the results directories. I am pretty sure it is not looking for Cray's env var ALPS_APP_PE. What I'm seeing is if I launch a Cray MPI job thusly:

aprun -n 1 advixe-cl --results-dir=/foodir ./a.out this works. When I run more than 1 rank I get a file open error on the results dir.

As background, Vtune used to have this problem also. They modified their collector to look for an MPI job's rank via the env vars of MPICH, OpenMPI and Cray's ALPS_APP_PE. I think advixe-cl needs this similar mod, to look for env var ALPS_APP_PE to flag an MPI job and to fetch the rank to use in the results dir name.

Could you confirm that the collector is not looking for ALPS_APP_PE to indicate an MPI job and to fetch the rank? If not, consider this a feature request to get advixe-cl to work under Cray's ALP MPI environment.

As a workaround, I aprun a wrapper script that launces multiple collectors with results dir set to <results dir>.$ALP_APP_PE. This works around the issue.

thanks

Ron

Kevin_O_Intel1 · ‎09-29-2016

Thanks Ron!

I'll file a feature request!

Regards,

Kevin

Kevin_O_Intel1 · ‎09-29-2016

Hi Ron,

I can confirm the behavior you mention. I have filed a feature request.

The development mentioned that the Advisor -trace-mpi command-line option is a workaround.

Can you try this?

Thanks!

Kevin

Kevin_O_Intel1 · ‎09-30-2016

Hi Ron,

We wanted to verify that the -trace-mpi was sufficient for your requirements..

Kevin

Ronald_G_2 · ‎09-30-2016

I'll go try it ... give me an hour.

Ronald_G_2 · ‎09-30-2016

well, mixed results. The -trace-mpi does indeed help - we now get results dirs and files with <node>.<rank> names. However, the results files are empty - the .aux files are only like 46 bytes. We run thusly:

aprun -n 60 -N 30 -S 15 -j 1 advixe-cl -trace-mpi -collect survey -search-dir=all:r=$PWD -project-dir $SCRATCH/cice/advisor/haswell/intel-p4-60pe -- ./cice

this runs a 60 MPI rank job on 2 nodes of haswell, 30 ranks per node, 15 ranks per socket, 1st hypercontext (not using hyperthreads).

The attached is STDOUT output from this run. Note all the "collection stopped" messages. The run actually ran for a minute or 2 with correct program output and results.

Also, re-ran this problem on 1 node with 20 ranks, 10 per socket. Same, the .aux files are just 46 bytes, and the GUI shows no user code and runtime of 0.02s.

The code is compiled with Intel Fortran 16.0.3 AND 17.0.0 with -g -qopt-report=5.

Any idea why "collection stopped" happens? The Cray compute nodes run a streamlined SLES OS, not quite a micro kernel but if the collectors are trying to coordinate via memory mapped files or RPC then we may have some issue.

Ron

Kevin_O_Intel1 · ‎09-30-2016

Let me discuss with our development team.

Can you share the project directory (including the result dirs.)?

Kevin

Kevin_O_Intel1 · ‎10-03-2016

Hi Ron,

VTune and Advisor should work on Cray compute nodes.

Can you confirm this is all being run on the Lustre file system?

Thanks!

Kevin

Ronald_G_2 · ‎10-04-2016

Yes, we are running the binary and storing results on Lustre.

I'm working on getting you project dir and results dir.

Ron

Ronald_G_2 · ‎10-13-2016

We've been able to run an MPI analysis on a Cray XC environment. I'll capture the details in a KB article on IDZ later.

BUT -trace-mpi is NOT SUFFICIENT to collect data on an Cray system. As I said, Cray's MPI sets env var ALPS_APP_PE to each rank numbered 0 to N-1. It does NOT set PMI_RANK. Without explicitly setting PMI_RANK to ALPS_APP_PE in a wrapper script the correct directories are not set up by the collection - specifically the "rank.<n>/rank.<n>.advixeexp files are NOT created if PMI_RANK is not set. This proves that -trace-mpi is NOT sufficient.

Observe: the first experiment I run a simple MPI pi.c with 4 ranks. In this case I do NOT set PMI_RANK in the wrapper. Here is the wrapper and run command and the resulting directories attached in file 'wrapper-no-pmt-rank-set.txt'. Note there are NO rank.<n>/rank.<n>.advixeexp files created.

green/collectorbug> more runit.sh

# --- script needed to lauch collector on Cray XC cluster ---

# --- we do NOT set PMI_RANK or PMI_PE, only ALPS_APP_PE is set to rank --

# export PMI_RANK=${ALPS_APP_PE}

# export PMI_PE=${ALPS_APP_PE}

export PMI_NO_FORK=1

advixe-cl --collect survey -trace-mpi --project-dir ./adviproj --search-dir all:r=/lustre/ttscratch1/green/collectorbug -- ./cpi

Here is the job launch

aprun -n 4 -N 4 -j1 -d1 -cc depth advixe-cl --collect survey -trace-mpi --project-dir /lustre/ttscratch1/green/collectorbug/adviproj --search-dir all:r=/lustre/ttscratch1/green/collectorbug -- bash runit.sh

the directories created are missing "rank" designation. Advisor can't open this collection

find adviproj -type f -exec ls -l {} \; >& wrapper-no-pmi-rank-set.txt

Now, rerun this run but uncomment out the line and set PMI_RANK thusly:

green/collectorbug> more runit.sh

export PMI_RANK=${ALPS_APP_PE}

# export PMI_PE=${ALPS_APP_PE}

export PMI_NO_FORK=1

advixe-cl --collect survey -trace-mpi --project-dir ./adviproj --search-dir all:r=/lustre/ttscratch1/green/collectorbug -- ./cpi

and look at the file created via find:

aprun -n 4 -N 4 -j1 -d1 -cc depth advixe-cl --collect survey -trace-mpi --project-dir /lustre/ttscratch1/green/collectorbug/adviproj --search-dir all:r=/lustre/ttscratch1/green/collectorbug -- bash runit.sh

find adviproj -type f -exec ls -l {} \; >& wrapper-with-pmi-rank-set.txt

Observe that the correct "rank" files are created in this case and Advisor GUI can view the collected data.

I would consider this a bug - the collector should be able to get the rank in a Cray MPI environment if it merely looks for ALPS_APP_PE instead of PMI_RANK. Seems simple enough, especially since I clued it in by using -trace-mpi. This is would help a lot of us trying to use Advisor XE on Cray XC systems.

Ron

hlrs · ‎11-08-2016

We also have problems with the intel advisor on our CRAY XC40.

We use now the following wrapper script to perform the survey collection through

aprun_opt="-n $n -N $N -j2 -d $[ $T * 2 ] -cc numa_node"
aprun $aprun_opt ./advixe-cl_survey.sh ${workdir} ${BIN}

#####
#advixe-cl_survey.sh $1 source path $2 Binary
#####
#!/bin/bash
export PMI_RANK=${ALPS_APP_PE}
export PMI_NO_FORK=1
#export PMI_NO_PREINITIALIZE=1 # not required for survey

advixe-cl -collect survey -trace-mpi --no-auto-finalize -flops-and-masks -project-dir ./advisor -search-dir src:r=$1 $2 ./input.par > output.log.survey

This works for MPI and OpenMP applications for us. The only point is, that we are not able to get any Flops and bandwith reports with the Haswell CPUs. If I remember right, with another clustre and Haswell CPUs this worked, but I have to check this again.

But when we want to collect the trip count data, we get the following error message:

Mon Nov 7 14:58:01 2016: [PE_0]:_pmi_alps_sync:alps response not OKAY
Mon Nov 7 14:58:01 2016: [PE_0]:_pmi_init:_pmi_alps_sync failed -1

advixe: Warning: The application returned a non-zero exit value.

We were able to fix this problem with export PMI_NO_PREINITIALIZE=1 so that our advixe-cl_tripcounts.sh wrapper is:

######
#advixe-cl_tripcounts.sh $1 source path $2 Binary
######
#!/bin/bash

export PMI_RANK=${ALPS_APP_PE}
export PMI_NO_FORK=1
export PMI_NO_PREINITIALIZE=1

export PMI_MMAP_SYNC_WAIT_TIME=300 #We have to check if this is really required

advixe-cl -collect tripcounts -trace-mpi --no-auto-finalize -project-dir ./advisor -search-dir src:r=$1 $2 ./input.par > output.log.tripcounts

The problem is, that we still get the following warning/error:

advixe: Warning: The application returned a non-zero exit value.

I have also checked with the following hello world, if it is a problem with the application or advixe-cl.

PROGRAM HELLO

 INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM

!C Fork a team of threads giving them their own copies of variables
!$OMP PARALLEL PRIVATE(NTHREADS, TID)


!C Obtain thread number
 TID = OMP_GET_THREAD_NUM()
 PRINT *, 'Hello World from thread = ', TID

!C Only master thread does this
 IF (TID .EQ. 0) THEN
  NTHREADS = OMP_GET_NUM_THREADS()
 PRINT *, 'Number of threads = ', NTHREADS
 END IF

!C All threads join master thread and disband
!$OMP END PARALLEL

END

Again the survey work fine, but the tripcounts does not work. In the interactive mode we get the following extended output:

aprun -n 1 advixe-cl --collect tripcounts --no-auto-finalize -project-dir ./advisor -search-dir all:r=./ -- ./hw_advisor_test
Intel(R) Advisor Command Line Tool
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.
advixe: Error: Internal error. Please contact Intel customer support team.
advixe: Error: Analysis terminated abnormally.
advixe: Error: An internal error has occurred. Our apologies for this inconvenience. Please gather a description of the steps leading up to the problem and contact the Intel customer support team.
advixe: Warning: The application returned a non-zero exit value.
Application 5827006 resources: utime ~0s, stime ~1s, Rss ~52160, inblocks ~11650, outblocks ~3178

Iused the following environment:

PrgEnv-intel/5.2.82 (ifort version 16.0.3)

Intel Advisor 17.1 (build 477503)

Anybody some idea?

hlrs · ‎11-10-2016

We also tested the following version without success:

Update 1 (build 486553)