Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4975 Discussions

Fail to run Intel Advisor Roofline Analysis on DevCloud

jasonlin316
Beginner
2,757 Views

I am trying to run roofline analysis using Intel Advisor on DevCloud.

 

Background:

I am using the entry node, which has a  Intel(R) Xeon(R) Gold 6128 CPU.

I source the setup shell from /opt/intel/inteloneapi/setvars.sh

Commands I used:

advixe-cl -collect survey -project-dir MyResults -- python sage.py (this seems fine)

advixe-cl -collect tripcounts -flop -project-dir MyResults -- python sage.py

advixe-cl -collect roofline -project-dir MyResults -- python sage.py

 

the last two commands report the following errors:

(tripcount collect error)

advisor: Opening result 75 % Loading 'tripcounts_2265789_0.tcs' file
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs001/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

(roofline collect error)

advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/advixe-runtrc.txt' ().
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

 

I tried to generate the report anyway with the command:

advixe-cl --report=roofline --project-dir=./MyResults --report-output=./roofline.html

but it shows that Analyzer cannot find the tripcount -flop data.

 

I attached the tripcount folder with some log files and trc files, hope these help, thanks.

 

 

 

 

0 Kudos
1 Solution
DiyaN_Intel
Moderator
2,157 Views

Hi,

We were able to reproduce the issue from our end. While running your code using the latest version of Intel Advisor 2023.0, we got a crash report when we ran the advisor roofline analysis in Windows 11 Enterprise and it hangs in between if we run it in Linux environment on Intel Developer Cloud.

We are investigating further on this issue at our end and will get back with an update soon.

Here are the below screenshots:

In Windows environment:

DiyaN_Intel_0-1676894641387.png

 

DiyaN_Intel_2-1676894655128.png

 

In Linux environment:

DiyaN_Intel_3-1676894675145.png

 

Regards,

Diya 

 

View solution in original post

0 Kudos
15 Replies
DiyaN_Intel
Moderator
2,731 Views

Hi,

 

Thank you for posting in Intel Communities.

 

Please try to run the python code which you have shared without Intel Advisor and check whether it is working fine or not as I can see in the error it is showing that the data file is corrupted.

 

Please share the sample reproducer code so that we can further debug your issue.

 

We tried with a sample python code in the nodes s001-n007 and s001-n088 with Intel(R) Xeon(R) Gold 6128 CPU with the same command you tried in DevCloud and it was running successfully.

 

Sharing the screenshots of the same.

advi.png

advi1.png

Regards,

Diya 

 

0 Kudos
jasonlin316
Beginner
2,608 Views

Thanks! I have run the program without Advisor and it was completed successfully.

I have attached a zip folder with the followings:

  1. reproduce.txt: how I set up my conda environment
  2. sage.py: the program I ran
  3. output.txt: what shows on my terminal after I ran the program

Hope these help, thank you.

 

Also, is there a way or a website showing which node (e.g., s001-n007) has what resources (CPU, GPU, etc.)? Currently, the DevCloud document tells us to enter a Linux command to list the available resources (properties), but I do not know the corresponding node number. Thanks.

0 Kudos
DiyaN_Intel
Moderator
2,569 Views

Hi,


We have tried to reproduce the issue and it looks like the issue is with your code. 

In order to verify the same, could you try in running on some other machine other than DevCloud and see if you are able to generate the results and get back to us?


Regards,

Diya



0 Kudos
jasonlin316
Beginner
2,412 Views

Hi,

 

I tried running advisor on my machine with an Intel(R) Xeon(R) Platinum 8368 CPU. I still run into the same problem.

 

I also tried running another python program:

import torch

m = 2048
k = 2048
j = 2048

a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)

 but I still get the same errors. I have attached the complete terminal output for this simple python program, hopefully that helps.

 

 

0 Kudos
DiyaN_Intel
Moderator
2,403 Views

Hi, 

 

We tried to reproduce the issue from our end and the same code was running properly for us.

DiyaN_Intel_0-1675410279081.png

DiyaN_Intel_3-1675410386951.png

There might be one of the possible reasons why it was not working probably the environment variables are not set.

Please run the below command to source the environment variables(run as a root/superuser):

 

source /opt/intel/oneapi/setvars.sh

 

Please run the below command and attach the logs :(run as a root/superuser):

 

python3 /opt/intel/oneapi/advisor/latest/bin64/adv_self_check.py

 

Please provide logs or screenshots of the exact error so that we can reproduce your issue from our end.

 

Regards, 

Diya

 

0 Kudos
jasonlin316
Beginner
2,387 Views

Hi,

 

I have run the self-check and it reports some error (part1 of figure); note: I also ran with "python3" and get the same result

error.png

 

I also try sourcing the setup.sh and run self-check with sudo, but it reports some error as well (part 2 of figure).

0 Kudos
DiyaN_Intel
Moderator
2,327 Views

Hi,

 

As we can see from the screenshot, the self-checker is failing, could you please try reinstalling the latest Intel Advisor toolkit or Intel oneAPI base toolkit in your system.

 

And on Intel Developer Cloud can you please let us know which node you were trying to access to understand more about the issue .

 

Please share the operating system details as well.

 

Regards,

Diya

 

 

0 Kudos
jasonlin316
Beginner
2,307 Views

Hi,

 

I believe I am already using the latest Advisor as I just installed it within two weeks.

I attached a screenshot including my Advisor version and OS details.

 

For Intel DevCloud, I am using the entry node, which has an Intel(R) Xeon(R) Gold 6128 CPU.

 

 

0 Kudos
DiyaN_Intel
Moderator
2,286 Views

Hi, 

 

Please follow these steps and check if it working fine for you or not, I have attached the program in which I am trying :

 

1. Please give this command to connect to DevCloud which has an Intel(R) Xeon(R) Gold 6128 CPU

 

pbsnodes | grep -i "skl\|gold6128" -B4

 

DiyaN_Intel_1-1675926825749.png

2. Then try to access the nodes whose "state = free" and "power_state=Running" by giving this command :

 

qsub -I -l nodes=<free_node>:ppn=2

 

DiyaN_Intel_2-1675927123155.png

3. Please try to run the roofline analysis with the program which I have tried:

 

advixe-cl -collect roofline -project-dir MyResults -- python hello.py

 

DiyaN_Intel_3-1675927285398.png

 

Regards,

Diya

 

 

0 Kudos
DiyaN_Intel
Moderator
2,221 Views

Hi,


Good day to you.


We have not heard back from you. Could you let us know if your issue has been resolved with the above solution?


If this resolves your issue, please make sure to accept this as a solution. This would help others with similar issue. Thank you!


Regards,

Diya



0 Kudos
jasonlin316
Beginner
2,203 Views

 

 

I connect to the node s001-n128.

 

Your code works with the analyzer. However, mine still doesn't work; I guess it is a program-dependent problem, but I still want to know possible ways to fix it. Currently, I still fail to profile on such a simple program:

 

import torch

m = 2048
k = 2048
j = 2048

a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)

 

 

I also ran the self-check.py you previously suggested, but it also fails on the DevCloud s001-n128 node (snapshot attached).

Thanks.

 

0 Kudos
DiyaN_Intel
Moderator
2,158 Views

Hi,

We were able to reproduce the issue from our end. While running your code using the latest version of Intel Advisor 2023.0, we got a crash report when we ran the advisor roofline analysis in Windows 11 Enterprise and it hangs in between if we run it in Linux environment on Intel Developer Cloud.

We are investigating further on this issue at our end and will get back with an update soon.

Here are the below screenshots:

In Windows environment:

DiyaN_Intel_0-1676894641387.png

 

DiyaN_Intel_2-1676894655128.png

 

In Linux environment:

DiyaN_Intel_3-1676894675145.png

 

Regards,

Diya 

 

0 Kudos
DiyaN_Intel
Moderator
1,798 Views

Hi, 


Good day to you.

The issue that we are facing in the latest version of Intel Advisor 2023.0 ,will be fixed in the future release of Intel Advisor 2023.2.

Sorry for the inconvenience caused.

Can you please confirm whether we can go forward and close this case?

If this resolves your issue, make sure to accept this as a solution. 

This would help others with similar issues. Thank you!


Thanks and Regards,

Diya


0 Kudos
DiyaN_Intel
Moderator
1,700 Views

Hi, 


We have not heard back from you.

Is your issue resolved with the above solution?

If this resolves your issue, make sure to accept this as a solution. 

This would help others with similar issues. Thank you!


Thanks and Regards, 

Diya


0 Kudos
DiyaN_Intel
Moderator
1,617 Views

Hi, 


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks and Regards, 

Diya



0 Kudos
Reply