- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have oneAPI 2021.2.0 installed on a small linux cluster. Recently, we rebooted the cluster after a power outage, and there are some lingering issues related to connectivity. Jobs still run and everything appears to be OK, but I cannot run the Intel Cluster Checker to see if there are still problems. When I invoke it, it just hangs:
$ clck -f nodefile
Intel(R) Cluster Checker 2021 Update 2 (build 20210301)
Running Collect
Nothing happens beyond this. It is very possible that there are network issues on our cluster. Is there a verbose or debug option that can alert me to what is causing clck to hang?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
>>Is there a verbose or debug option that can alert me to what is causing clck to hang?
-l / –log-level: Specifies the output level. Recognized values are (in increasing order of verbosity)**: alert, critical, error, warning, notice, info, and debug. The default log level is an error.
For more details, you can refer to the below link
If your issue persists, please provide us with the error log details to investigate more on your issue.
Could you also please provide us your system environment details(OS version)?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CentOS Linux release 7.8.2003 (Core)
I ran clck with -l debug. Things hung as before, and then I ctrl-C'ed to abort. I got messages like this:
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/pdsh -b -K -w burn035 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempS1Fykb/pidfile_burn035);echo $?'
burn035: 1
burn035: head: cannot open ‘/home4/mcgratta/.clck/clck-collect-tempS1Fykb/pidfile_burn035’ for reading: No such file or directory
burn035: bash: line 0: kill: -: arguments must be process or job IDs
I checked the .clck directory, and then the tmp directory corresponding to my session. I noticed that some of the node pid-files were not listed. There are 36 nodes, and 4 were not listed (like burn035 shown above). So it appears that the pidfiles are not all being created. This may well be a problem with our cluster, but I cannot see what is unusual about the nodes that are not assigned pidfiles, other than they are at the end of the sequence. In other words, burn001 through burn032 have pidfiles, burn033 through burn036 do not.
Is there something I can check on the bad nodes to see why they are not writing pidfiles.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please run the below commands and let us know if you face any issues?
mpirun -n 36 -ppn 1 -f hostfile hostname
mpirun -bootstrap pdsh -n 36 -ppn 1 -f hostfile hostname
The above commands are to just check whether you have access to all the nodes or not
Could you also run the below command and provide us the details.
cat hostfile
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our cluster uses the psm libfabric that we build. Invoking mpirun at the command line does not work, but we can run slurm job control scripts. All of our nodes work. We can run jobs on all the nodes. Our only problem is that clck does not create pidfiles for 4 out of the 36 nodes (the last 4 nodes). This seems to cause the hang, but I cannot figure out what is different about these 4 nodes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried running the Cluster Checker with only two nodes. Both nodes produce a pidfile, so I don't think that is the problem. Here is the session:
[mcgratta@burn ~]$ clck -f nodefile2 -l debug
Intel(R) Cluster Checker 2021 Update 2 (build 20210301)
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/health_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/health_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/cpu_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/cpu_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/cpu_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/cpu_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): cpuid, cpuinfo, cpupower, hwloc_dump_hwdata, kernel_tools, lscpu,
numactl, uname
analyzer extension: cpu
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/environment_variables_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/environment_variables_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): printenv, uname
analyzer extension: environment
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/ethernet.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/ethernet.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): ethtool, ethtool_show_coalesce, ipaddr, uname
analyzer extension: ethernet
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/infiniband_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/infiniband_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/infiniband_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/infiniband_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): datconf, ibstat, ibv_devinfo, lspci, ofedinfo, ulimit, uname
analyzer extension: infiniband
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/dapl_fabric_providers_present.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/dapl_fabric_providers_present.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): datconf, ibstat, ipaddr, uname
analyzer extension: datconf
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/network_time_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/network_time_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): chronyc, ntpq, uname
analyzer extension: ntp
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/node_process_status.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/node_process_status.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): ps, uname
analyzer extension: process
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/opa_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/opa_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/opa_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.2.0/etc/fwd/opa_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.2.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.2.0/etc/providers
provider(s): fw_ver, lspci, opahfirev, opatools, ulimit, uname
analyzer extension: opa
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.2.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.2.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.2.0/kb/data/msg_schema.xml
Postprocessor config file: /opt/intel/oneapi/clck/2021.2.0/etc/postprocessor/table.xml
Database: clck_default, at: "$HOME/.clck/2021.2.1/clck.db".
Running Collect
provider(s): getent, ip, uname
about to copy to shared location
clck-collect temp-shared location created
about to copy env to shared location
clck-collect copied env variables to temp-shared location
accumulate endpoint = tcp://129.6.159.109:49153
Accumulate server started
Starting pre-check............
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/pdsh -b -K -w burn001,burn002 'if [[ ! -d /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_ND;elif [[ ! -w /home4/mcgratta/.clck ]] || [[ ! -x /home4/mcgratta/.clck ]] || [[ ! -r /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_NON_RW;else echo CLCK_PRECHECK_OK;fi;stat -c "#####SHAREDDIR_INODE %i#####" /home4/mcgratta/.clck;'
Pre-check completed successfully
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/pdsh -b -K -w burn001,burn002 ' . /home4/mcgratta/.clck/env_prop-tempG0UTv5/env_file1qwMgU; /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/clck_run_provider -c /home4/mcgratta/.clck/clck-collect-tempnOKnfD/temp-configjtO1Kg -e tcp://129.6.159.109:49153 -l debug -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ipaddr.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ethtool.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ibstat.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ibv_devinfo.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ps.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ofedinfo.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ulimit.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/opatools.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/getent.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/cpuinfo.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/uname.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/cpuid.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/fw_ver.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/lspci.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/hwloc_dump_hwdata.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/cpupower.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/printenv.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/opahfirev.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/kernel_tools.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/datconf.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ntpq.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/chronyc.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/lscpu.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/numactl.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ethtool_show_coalesce.xml -f /opt/intel/oneapi/clck/2021.2.0/etc/providers/ip.xml'
^C
Caught Ctrl-C. Cleaning up.
DO NOT HIT CTRL-C AGAIN. Sending Ctrl-C multiple times halts the cleaning process which can leave processes running on the nodes.
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/pdsh -b -K -w burn001 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempnOKnfD/pidfile_burn001);echo $?'
Draining queue of accumulated data providers
Received SIGINT / SIGTERM. Cleaning up and stopping...
burn001: 0
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.2.0/libexec/intel64/pdsh -b -K -w burn002 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempnOKnfD/pidfile_burn002);echo $?'
burn002: 0
[mcgratta@burn ~]$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please set I_MPI_HYDRA_BOOTSTRAP=SLURM before running the cluster checker command and later run the clck command?
Secondly, set I_MPI_HYDRA_BOOTSTRAP=SSH and again run the cluster checker command and provide the complete logs of both.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[mcgratta@burn ~]$ set I_MPI_HYDRA_BOOTSTRAP=SLURM
[mcgratta@burn ~]$ clck -f nodefile2 -l debug
Intel(R) Cluster Checker 2021 Update 3 (build 20210615)
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/health_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/health_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): cpuid, cpuinfo, cpupower, hwloc_dump_hwdata, kernel_tools, lscpu,
numactl, uname
analyzer extension: cpu
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/environment_variables_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/environment_variables_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): printenv, uname
analyzer extension: environment
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/ethernet.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/ethernet.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): ethtool, ethtool_show_coalesce, ipaddr, uname
analyzer extension: ethernet
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): datconf, ibstat, ibv_devinfo, lspci, ofedinfo, ulimit, uname
analyzer extension: infiniband
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/dapl_fabric_providers_present.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/dapl_fabric_providers_present.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): datconf, ibstat, ipaddr, uname
analyzer extension: datconf
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/network_time_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/network_time_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): chronyc, ntpq, uname
analyzer extension: ntp
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/node_process_status.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/node_process_status.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): ps, uname
analyzer extension: process
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): fw_ver, lspci, opahfirev, opatools, ulimit, uname
analyzer extension: opa
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Postprocessor config file: /opt/intel/oneapi/clck/2021.3.0/etc/postprocessor/table.xml
Database: clck_default, at: "$HOME/.clck/2021.3.1/clck.db".
Running Collect
provider(s): getent, ip, uname
about to copy to shared location
clck-collect temp-shared location created
about to copy env to shared location
clck-collect copied env variables to temp-shared location
accumulate endpoint = tcp://129.6.159.109:49152
Accumulate server started
Starting pre-check............
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001,burn002 'if [[ ! -d /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_ND;elif [[ ! -w /home4/mcgratta/.clck ]] || [[ ! -x /home4/mcgratta/.clck ]] || [[ ! -r /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_NON_RW;else echo CLCK_PRECHECK_OK;fi;stat -c "#####SHAREDDIR_INODE %i#####" /home4/mcgratta/.clck;'
Pre-check completed successfully
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001,burn002 ' . /home4/mcgratta/.clck/env_prop-tempXTCxtx/env_fileYDuWK7; /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/clck_run_provider -c /home4/mcgratta/.clck/clck-collect-tempF9zzDM/temp-configgKcacX -E /home4/mcgratta/.clck/env_prop-tempXTCxtx/env_fileYDuWK7 -e tcp://129.6.159.109:49152 -l debug -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/numactl.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/uname.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ulimit.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/hwloc_dump_hwdata.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpuid.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/opatools.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpupower.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/datconf.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ethtool.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ethtool_show_coalesce.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ntpq.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ibv_devinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/chronyc.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/kernel_tools.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/lscpu.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/printenv.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpuinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ps.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ibstat.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ipaddr.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ofedinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/fw_ver.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/lspci.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/opahfirev.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/getent.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ip.xml'
^C
Caught Ctrl-C. Cleaning up.
DO NOT HIT CTRL-C AGAIN. Sending Ctrl-C multiple times halts the cleaning process which can leave processes running on the nodes.
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempF9zzDM/pidfile_burn001);echo $?'
Draining queue of accumulated data providers
Received SIGINT / SIGTERM. Cleaning up and stopping...
burn001: 0
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn002 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempF9zzDM/pidfile_burn002);echo $?'
burn002: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[mcgratta@burn ~]$ set I_MPI_HYDRA_BOOTSTRAP=SSH
[mcgratta@burn ~]$ clck -f nodefile2 -l debug
Intel(R) Cluster Checker 2021 Update 3 (build 20210615)
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/health_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/health_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/cpu_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): cpuid, cpuinfo, cpupower, hwloc_dump_hwdata, kernel_tools, lscpu,
numactl, uname
analyzer extension: cpu
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/environment_variables_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/environment_variables_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): printenv, uname
analyzer extension: environment
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/ethernet.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/ethernet.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): ethtool, ethtool_show_coalesce, ipaddr, uname
analyzer extension: ethernet
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/infiniband_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): datconf, ibstat, ibv_devinfo, lspci, ofedinfo, ulimit, uname
analyzer extension: infiniband
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/dapl_fabric_providers_present.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/dapl_fabric_providers_present.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): datconf, ibstat, ipaddr, uname
analyzer extension: datconf
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/network_time_uniformity.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/network_time_uniformity.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): chronyc, ntpq, uname
analyzer extension: ntp
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/node_process_status.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/node_process_status.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): ps, uname
analyzer extension: process
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_user.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_user.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
Include: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_base.xml
Opening Fwd: /opt/intel/oneapi/clck/2021.3.0/etc/fwd/opa_base.xml
provider aux path: /opt/intel/oneapi/clck/2021.3.0/provider/share
provider path: /opt/intel/oneapi/clck/2021.3.0/etc/providers
provider(s): fw_ver, lspci, opahfirev, opatools, ulimit, uname
analyzer extension: opa
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
analyzer extension: ulimit
analyzer extension path: /opt/intel/oneapi/clck/2021.3.0/analyzer/intel64/cpp
message catalog path: /opt/intel/oneapi/clck/2021.3.0/kb/data/
message catalog: msg_en.xmc
message schema: /opt/intel/oneapi/clck/2021.3.0/kb/data/msg_schema.xml
Postprocessor config file: /opt/intel/oneapi/clck/2021.3.0/etc/postprocessor/table.xml
Database: clck_default, at: "$HOME/.clck/2021.3.1/clck.db".
Running Collect
provider(s): getent, ip, uname
about to copy to shared location
clck-collect temp-shared location created
about to copy env to shared location
clck-collect copied env variables to temp-shared location
accumulate endpoint = tcp://129.6.159.109:49152
Accumulate server started
Starting pre-check............
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001,burn002 'if [[ ! -d /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_ND;elif [[ ! -w /home4/mcgratta/.clck ]] || [[ ! -x /home4/mcgratta/.clck ]] || [[ ! -r /home4/mcgratta/.clck ]]; then echo CLCK_PRECHECK_NON_RW;else echo CLCK_PRECHECK_OK;fi;stat -c "#####SHAREDDIR_INODE %i#####" /home4/mcgratta/.clck;'
Pre-check completed successfully
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001,burn002 ' . /home4/mcgratta/.clck/env_prop-temppRilnd/env_filemtwajy; /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/clck_run_provider -c /home4/mcgratta/.clck/clck-collect-tempet0Zzc/temp-configAxKwrS -E /home4/mcgratta/.clck/env_prop-temppRilnd/env_filemtwajy -e tcp://129.6.159.109:49152 -l debug -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/numactl.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/uname.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/chronyc.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ethtool_show_coalesce.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ulimit.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ofedinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ibv_devinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/datconf.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ibstat.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpupower.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/lscpu.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/kernel_tools.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpuid.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ntpq.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/printenv.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/cpuinfo.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ps.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ethtool.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/hwloc_dump_hwdata.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ipaddr.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/opahfirev.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/lspci.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/fw_ver.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/getent.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/opatools.xml -f /opt/intel/oneapi/clck/2021.3.0/etc/providers/ip.xml'
^C
Caught Ctrl-C. Cleaning up.
DO NOT HIT CTRL-C AGAIN. Sending Ctrl-C multiple times halts the cleaning process which can leave processes running on the nodes.
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn001 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempet0Zzc/pidfile_burn001);echo $?'
Draining queue of accumulated data providers
Received SIGINT / SIGTERM. Cleaning up and stopping...
burn001: 0
data collection command:
PDSH_SSH_ARGS_APPEND="$PDSH_SSH_ARGS_APPEND $PDSH_SSH_ARGS -oStrictHostKeyChecking=no -oLogLevel=FATAL" PDSH_SSH_ARGS="" /opt/intel/oneapi/clck/2021.3.0/libexec/intel64/pdsh -b -K -w burn002 'kill -SIGINT -- -$(head -n 1 /home4/mcgratta/.clck/clck-collect-tempet0Zzc/pidfile_burn002);echo $?'
burn002: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on it and will get back to you soon.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am escalating the issue
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Thanks for accepting our solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel."

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page