Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

VT_LOGFILE_PREFIX not working with Intel trace collector

Pramod_K_
Beginner
1,319 Views

I am using Intel trace collector. For small simulations it works fine but for large runs I get following error:

[20] Intel(R) Trace Collector ERROR: Failed writing buffer to flush file "/tmp/VT-flush-blueDetector_scalasca_itac_x86.rts_0020-008123.dat": No space left on device

From Intel documentation I see that by default traces are written to /tmp and we are supposed to set VT_LOGFILE_PREFIX.  But even if I set this environmental variable to directory under lustre file system and pass -x option for mpiexec, I still get the same error.

$ export VT_LOGFILE_PREFIX=/lustre/jhome7/jicg41/jicg4110/some_dir_path
$ LD_PRELOAD=/usr/local/intel/itac/8.1.2.033/itac/slib_impi4/libVT.so mpiexec -x -trace -np 48  ./app_exe

Note:

  • with above settings, only first file i.e. app__itac_x86.rts.prot is written to VT_LOGFILE_PREFIX directory
  • I am sure that -x option exports all env variables to all mpi processes, I have tested this


Am I missing something?

0 Kudos
5 Replies
Gergana_S_Intel
Employee
1,319 Views

Hey Pramod,

Thanks for posting.  There are actually 2 sets of files being written to different locations when the Intel® Trace Collector is running.

There are the trace files which contain the physical trace information you will later on read using the Intel® Trace Analyzer GUI.  Those files are controlled via the VT_LOGFILE_PREFIX env variable and their default location is actually the directory of where you started the job.  Those files will generally be written after your application's MPI_Finalize() call.

We also have temporary flush files.  Those are files written during execution of the application by the trace collector and are used to store temporary information before the actual trace files are created.  The flush files are controlled by the VT_FLUSH_PREFIX env variable.  In your case, you need to use this variable (and not VT_LOGFILE_PREFIX) to change their default location (/tmp).

So your script will look like this:

$ export VT_FLUSH_PREFIX=/lustre/jhome7/jicg41/jicg4110/some_dir_path
$ LD_PRELOAD=/usr/local/intel/itac/8.1.2.033/itac/slib_impi4/libVT.so mpiexec -x -trace -np 48  ./app_exe

I hope this helps.  Let me know how it goes.

Regards,
~Gergana

0 Kudos
Pramod_K_
Beginner
1,319 Views

Perfect! working fine now!

Thanks Gergana! 

0 Kudos
Gergana_S_Intel
Employee
1,319 Views

Glad to hear it :)  Let me know how you like using the tool.

Regards,
~Gergana

0 Kudos
Pramod_K_
Beginner
1,319 Views

Looking at small trace files (hundreds of MBs) work fine.

For traces upto few gigabytes, charts->event timeline took 10-20 minutes (those options were just disabled and there is NO indication whether tool is preparing charts etc...it would be nice to have some indication!)

My actual simulation generates ~150GB of traces and it looks like trace analyzer takes very long time to prepare timeline (timelines are disabled, again no indication!)

I know these are very large traces and I am already working on reducing trace sizes from my simulation.

-Pramod

0 Kudos
Gergana_S_Intel
Employee
1,319 Views

Hey Pramod,

Thanks for the feedback!  We have actually added a progress bar in the latest Intel® Trace Analyzer and Collector 8.1 Update 3 release.  I believe you have Update 2.  Just look at the attached image and note the oval highlight in purple.

Since you have a valide license, I'll urge you to upgrade.

Also, if you need any advice on applying filters to reduce the trace file size, you can take a loot at the following Intel® Trace Collector filtering article.

Finally, to reduce some of your startup time, you can separately pre-create the cache file for your application's trace file.  That's what the trace analyzer uses in subsequent runs to reduce the startup cost.  Here's a quick example:

traceanalyzer --cli trace.stf -c0 -w

Once complete, you'll see a trace.stf.cache file created alongsite your original trace.stf.  Then open up the Trace Analyzer GUI as you normally would.  The GUI will pick up the cache file automatically.  More info is available in the CLI section of the Intel® Trace Analyzer Reference Manual.

Hope this helps.

Regards,
~Gergana

0 Kudos
Reply