Intel Trace Collector and SLURM

Kevin_McGrattan · ‎02-16-2021

What is the best way to do this

$ mpirun -trace -n 4 ./myApp

with SLURM srun?

PrasanthD_intel · ‎02-17-2021

Hi Kevin,

To use ITAC with srun

i) Source all the required components itac,mpi,compilers

ii) Set LD_PRELOAD with the libVT.so path

export LD_PRELOAD=/opt/intel/oneAPI/2021.1.2/itac/2021.1.1/slib/libVT.so

export VT_LOGFILE_FORMAT=stfsingle

export VT_PCTRACE=5

iii) Run srun without -trace flag

srun -n 2 ./a.out

Let us know if you get any errors.

Regards

Prasanth

Kevin_McGrattan · ‎02-17-2021

These lines seem to have done the trick. Thanks.

export LD_PRELOAD=/opt/intel/oneapi/itac/latest/slib/libVT.so
export VT_LOGFILE_FORMAT=stfsingle
export VT_PCTRACE=5

Kevin_McGrattan · ‎02-17-2021

One follow-up question---when I export the configuration file at run time, as shown in the previous post, Collector produces the .stf file and Analyzer reads it. However, I cannot get a breakdown of the usage of my subroutines as listed in the .conf file. When I compile with -tcollect, I do see the list of subroutines. So my question is---do I need to recompile the code with -tcollect in order to have my subroutine filters applied? Or can everything be done at run time?

PrasanthD_intel · ‎02-22-2021

Hi Kevin,

Sorry for the delay in response.

As you said you cannot find all the MPI calls in GUI, have you tried ungrouping MPI calls which you can do by selecting Ungroup MPI after Right-clicking on Group MPI.

Coming to enabling automatic function instrumentation by using -tcollect, I will ask the team whether you can do that during runtime.

But there is no need for that as you should see all MPI calls irrespective of that.

Regards

Prasanth

Kevin_McGrattan · ‎02-22-2021

I should have been more clear. Without any special tracing or collecting options during compilation, I can use my configuration (.conf) file to trace some or all MPI calls. Somehow, the executable has been instrumented automatically to allow me to break down the MPI calls into groups. However, I want to do this with my own subroutines. That is, I want to trace certain subroutines and functions of the "Application" that I have listed in the .conf file. If I recompile with the -tcollect option, this works---I can trace individual subroutines. However, if I do not recompile with the -tcollect, I cannot. I can always get a breakdown of the MPI calls, but to get a breakdown of my own subroutine calls, I need to recompile.

My question---is there an option to use at runtime to allow me to trace my routines that are listed in the .conf file, without having to recompile? It is very convenient for me to just add an option at runtime when I find that there is something not quite right in a given computation.

PrasanthD_intel · ‎02-24-2021

Hi Kevin,

I believe you have used ITAC APIs in your code for profiling your subroutines and specific parts of the code. Currently, I am not sure whether profiling can be done directly at runtime. So I am forwarding this to the internal team for better support.

Regards

Prasanth

Heinrich_B_Intel · ‎02-25-2021

Hi Kevin,

I'm sorry, but there is no way to instrument user functions at runtime. There used to be this runtime instrumentation many years ago but it introduced too much overhead. The technology (pin) is available but there was not much development on ITAC in the last 10 years.

But there are probably some other tricks that can help you.

you may use the ITAC api calls to instrument regions of interest. If you already have your own timing system with calls like begin_timing and end_timing you may put API calls into this to trace these regions.
If you are using C++ you can generate a C++ timing class with API calls in the constructor and destructor. Initiating this class in a scope will time it.
you may re-link an instrumented executable with libVTnull.a or libVTnull.so. these libraries contain empty stubs to API calls used by the instrumentation. This helps if you do not want to recompile when not using ITAC.

if this sounds interesting I can provide more information.

best regards,

Heinrich

Kevin_McGrattan · ‎02-25-2021

OK, thanks for the info. Like I said above, doing everything at runtime without having to recompile is merely a convenience. When I recompile with -tcollect, things all work fine.