Compiling with prof-gen=src option. Corrupted .dyn files
compiling application (library) with prof-gen=src option allows to generate *.dyn files while runnig test, which use function from this application (library). But sometimes it generates corrupted *.dyn files (but sometimes it generates normal).
The word 'corrupted' I got from 'profmerge' output:
warning #30019: File './500f0b52_03081.dyn' is corrupt. Unrecognized segment id, 0xa102202100000001.
./500f0b52_03081.dyn: Invalid argument
FATAL ERROR: fseek on file ./500f0b52_03081.dyn failed
warning #30019: File './500f0b52_03081.dyn' is corrupt. Unrecognized segment id, 0xa102202100000001 ./500f0b52_03081.dyn: Invalid argument FATAL ERROR: fseek on file ./500f0b52_03081.dyn failed
without having something to reproduce it's hard to tell.
My first guess would be: Did the application you profiled not terminate correctly? Reason: PGO needs to close the created *.dyn files at application exit. It does so by using the "atexit" hook. In case your application is receiving an ungraceful termination (e.g. SIGKILL) or an exception that results as such the *.dyn files won't be correctly written to the disk.
I am sorry for brief description, but reproducing of this problem is quiet difficult: application is run on many computers (nodes) with many (8) tasks per node. It is interesting that if I run application on 3 nodes it generates report well, but in case of 8 nodes it fail with described issue.
Sometimes when I remove corrupted .dyn files, profmerge generates .dpi file successfully and report is generated successfully to. Thus, is there any possibility to ignore corrupted .dyn files for profmerge?
would it be OK for you to share some of the corrupted *.dyn files? If so, please use a private reply because the data therein might be critical for you.
Other reasons we're currently speculating about, and you might check too, could be:
Out of disk space.
File system shared among nodes; some processes might create the same *.dyn file name and overwriting each other. Solution: Use $PROF_DIR and set it to a different path for each node. That's just my guess... engineering does not think that's the case.
Mixture of different OS images/configurations?
Maybe corruption only occurs on always the same node(s)? Would be worth to look at this/these system(s) then.
How long is your application running? Do the systems you use have ECC RAM to exclude memory corruption?
engineering is quite sure that multiple processes have written to the same file. Hence two or more nodes created the same file name for the *.dyn file. Can you control the environment variables for each node and set $PROF_DIR individually, e.g.: node1: /shared/node1 node2: /shared/node2 ...
Please let us know if this helped.
We're currently discussing a better solution so this won't happen for future versions.
unfortunately the solution with $PROF_DIR is the only workaround. Engineering is already working on a sound solution. However, this will be in earliest with one of the first updates of 13.0. Because of design considerations compilers of version 12.1 cannot be addressed anymore. If you like I can keep you updated on the progress for 13.0.