Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5253 Discussions

Vtune hangs without output (only on nodes)

parco
Beginner
3,376 Views

Hey,

vtune just hangs without any output. It does not matter what I want to test.

for example:

vtune -collect hotspot — hostname

and even

vtune-self-checker.sh

does not provide any output whatsoever. We have a cluster running with one login node and 36 compute nodes. Everything works fine on the head-node, nothing on the compute nodes. The whole openAPI HPC and Base toolkits are installed on a shared filesystem on the headnode.

 

Any ideas what to do or how to get some output?

 

vtune Version: Intel(R) VTune(TM) Profiler 2023.0.0 (build 624757) Command Line Tool

0 Kudos
14 Replies
parco
Beginner
3,375 Views

The only way to quit this vtune process is with CTRL+C. But there seems to be some processes running in the backround even then:

 

$ ps aux | grep /opt/intel

xy    609973 99.2  0.4 677552 396120 pts/0   Sl   10:50   1:52 /opt/intel/oneapi/vtune/2023.0.0/bin64/amplxe-runss --context-value-list
xy    609986 99.9  0.3 612024 324664 pts/0   Sl   10:50   1:29 /opt/intel/oneapi/vtune/2023.0.0/bin64/amplxe-runss --ui-output-format xml --ui-output-fd 5 --context-value-list

 

 

0 Kudos
AthiraM_Intel
Moderator
3,344 Views

Hi,


Thank you for posting in Intel Communities.

 

Could you please share the following details:

 

  1. OS and Hardware details
  2. If your operating system is linux please share kernel details:

      Please use the below command to get kernel details :

      uname -a

 

    3. Exact steps you followed and sample reproducer to reproduce the same from our end



Thanks




0 Kudos
parco
Beginner
3,339 Views

Hello,

 

Our Head Node:

Dell Poweredge R740
Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
96GB DDr4
Linux hydra-head 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux

 

And our computenodes:

36x Dell Poweredge C6420.
2x Intel(R) Xeon(R) Gold 6130F CPU @ 2.10GHz (Hyperthreading disabled)
96GB DDr4
Linux hydra01 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux

Some more information:

We installed intel oneapi on our headnode on /opt/intel/oneapi. This filesystem is shared with our computenodes. They only have readonly access, I tried changing that to readwrite but it did not change anything. Do I maybe have to run some local installations on the computenodes?

 

There are not many steps to follow:

1. Install Intel vtune with OneApi (we have everything availiable installed, hpc_toolkit and base_toolkit)
2. Execute any vtune test on the head (including vtune-self-checker.sh), works (for example vtune -collect hotspot -- hostname)
3. Execute any vtune test on the node: does not work

0 Kudos
parco
Beginner
3,329 Views

I found some extra information in the kernel logs:

vtune[78877]: segfault at 3 ip 0000000000000003 sp 00007efdace3e1d8 error 14 in vtune[55d496e00000+50000]

Code: Unable to access opcode bytes at RIP 0xffffffffffffffd9.

0 Kudos
AthiraM_Intel
Moderator
3,255 Views

Hi,

 

We are checking on this internally. We will get back to you with an update.

 

 

Thanks

 

0 Kudos
AthiraM_Intel
Moderator
3,075 Views

Hi,


We are sorry for the delay. Could you please let us know whether the drivers installed on all the compute nodes?


If not, please install and check whether the issue still persists.



Thanks




0 Kudos
parco
Beginner
3,067 Views

The drivers seem to be loaded:

root@hydra01:/opt/intel/oneapi/vtune/latest/sepdk/src# ./insmod-sep -q
pax driver is loaded and owned by group "vtune" with file permissions "660".
socperf3 driver is loaded and owned by group "vtune" with file permissions "660".
sep5 driver is loaded and owned by group "vtune" with file permissions "660".

socwatch driver is loaded and owned by group "vtune" with file permissions "660".

vtsspp driver is loaded and owned by group "vtune" with file permissions "660".

 

0 Kudos
AthiraM_Intel
Moderator
3,027 Views

Hi,

 

Please do the following on the failing node:

 

rm -rf /tmp/amplxe*

 

<run the failing scenario>

 

tar zcvf logs.tgz /tmp/amplxe-log-*

 

Please share the logs.tgz with us.

 

If you face any issue, please let us know.

 

Thanks

 

0 Kudos
parco
Beginner
2,949 Views

Hello,

I have attached the requested logs.

0 Kudos
AthiraM_Intel
Moderator
2,970 Views

Hi,


We have not heard back from you. Could you please share the logs.tgz which we mentioned in the last response.



Thanks


0 Kudos
AthiraM_Intel
Moderator
2,908 Views

Hi,

 

Thank you for sharing the log file. Unfortunately, there is not much in the log files. It looks like amplxe-runss is having serious trouble on this system, but we will need extended logging and/or a crashdump to figure where it is stuck. We can start with setting AMPLXE_LOG_LEVEL=trace in the environment and launching 

 

amplxe-runss -help

 

on the node for a very basic smoke test. And if this won't hang, please proceed with

 

amplxe-runss -context-value-list

 

and share the logs again.

 

 

Thanks.

 

0 Kudos
AthiraM_Intel
Moderator
2,851 Views

Hi,


We have not heard back from you. Could you please share the log file which we mentioned in the last response.



Thanks


0 Kudos
parco
Beginner
2,629 Views

Here are the new logfiles:

 

 

BTW: "amplxe-runss -context-value-list" hangs and cannot be canceled with Ctr+C

0 Kudos
AthiraM_Intel
Moderator
2,733 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.



Thanks



0 Kudos
Reply