Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
4789 Discussions

Fail to run Intel Advisor Roofline Analysis on DevCloud

jasonlin316
Beginner
953 Views

I am trying to run roofline analysis using Intel Advisor on DevCloud.

 

Background:

I am using the entry node, which has a  Intel(R) Xeon(R) Gold 6128 CPU.

I source the setup shell from /opt/intel/inteloneapi/setvars.sh

Commands I used:

advixe-cl -collect survey -project-dir MyResults -- python sage.py (this seems fine)

advixe-cl -collect tripcounts -flop -project-dir MyResults -- python sage.py

advixe-cl -collect roofline -project-dir MyResults -- python sage.py

 

the last two commands report the following errors:

(tripcount collect error)

advisor: Opening result 75 % Loading 'tripcounts_2265789_0.tcs' file
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs001/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

(roofline collect error)

advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/advixe-runtrc.txt' ().
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

 

I tried to generate the report anyway with the command:

advixe-cl --report=roofline --project-dir=./MyResults --report-output=./roofline.html

but it shows that Analyzer cannot find the tripcount -flop data.

 

I attached the tripcount folder with some log files and trc files, hope these help, thanks.

 

 

 

 

0 Kudos
12 Replies
DiyaN_Intel
Moderator
928 Views

Hi,

 

Thank you for posting in Intel Communities.

 

Please try to run the python code which you have shared without Intel Advisor and check whether it is working fine or not as I can see in the error it is showing that the data file is corrupted.

 

Please share the sample reproducer code so that we can further debug your issue.

 

We tried with a sample python code in the nodes s001-n007 and s001-n088 with Intel(R) Xeon(R) Gold 6128 CPU with the same command you tried in DevCloud and it was running successfully.

 

Sharing the screenshots of the same.

advi.png

advi1.png

Regards,

Diya 

 

jasonlin316
Beginner
805 Views

Thanks! I have run the program without Advisor and it was completed successfully.

I have attached a zip folder with the followings:

  1. reproduce.txt: how I set up my conda environment
  2. sage.py: the program I ran
  3. output.txt: what shows on my terminal after I ran the program

Hope these help, thank you.

 

Also, is there a way or a website showing which node (e.g., s001-n007) has what resources (CPU, GPU, etc.)? Currently, the DevCloud document tells us to enter a Linux command to list the available resources (properties), but I do not know the corresponding node number. Thanks.

DiyaN_Intel
Moderator
766 Views

Hi,


We have tried to reproduce the issue and it looks like the issue is with your code. 

In order to verify the same, could you try in running on some other machine other than DevCloud and see if you are able to generate the results and get back to us?


Regards,

Diya



jasonlin316
Beginner
609 Views

Hi,

 

I tried running advisor on my machine with an Intel(R) Xeon(R) Platinum 8368 CPU. I still run into the same problem.

 

I also tried running another python program:

import torch

m = 2048
k = 2048
j = 2048

a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)

 but I still get the same errors. I have attached the complete terminal output for this simple python program, hopefully that helps.

 

 

DiyaN_Intel
Moderator
600 Views

Hi, 

 

We tried to reproduce the issue from our end and the same code was running properly for us.

DiyaN_Intel_0-1675410279081.png

DiyaN_Intel_3-1675410386951.png

There might be one of the possible reasons why it was not working probably the environment variables are not set.

Please run the below command to source the environment variables(run as a root/superuser):

 

source /opt/intel/oneapi/setvars.sh

 

Please run the below command and attach the logs :(run as a root/superuser):

 

python3 /opt/intel/oneapi/advisor/latest/bin64/adv_self_check.py

 

Please provide logs or screenshots of the exact error so that we can reproduce your issue from our end.

 

Regards, 

Diya

 

jasonlin316
Beginner
584 Views

Hi,

 

I have run the self-check and it reports some error (part1 of figure); note: I also ran with "python3" and get the same result

error.png

 

I also try sourcing the setup.sh and run self-check with sudo, but it reports some error as well (part 2 of figure).

DiyaN_Intel
Moderator
524 Views

Hi,

 

As we can see from the screenshot, the self-checker is failing, could you please try reinstalling the latest Intel Advisor toolkit or Intel oneAPI base toolkit in your system.

 

And on Intel Developer Cloud can you please let us know which node you were trying to access to understand more about the issue .

 

Please share the operating system details as well.

 

Regards,

Diya

 

 

jasonlin316
Beginner
504 Views

Hi,

 

I believe I am already using the latest Advisor as I just installed it within two weeks.

I attached a screenshot including my Advisor version and OS details.

 

For Intel DevCloud, I am using the entry node, which has an Intel(R) Xeon(R) Gold 6128 CPU.

 

 

DiyaN_Intel
Moderator
483 Views

Hi, 

 

Please follow these steps and check if it working fine for you or not, I have attached the program in which I am trying :

 

1. Please give this command to connect to DevCloud which has an Intel(R) Xeon(R) Gold 6128 CPU

 

pbsnodes | grep -i "skl\|gold6128" -B4

 

DiyaN_Intel_1-1675926825749.png

2. Then try to access the nodes whose "state = free" and "power_state=Running" by giving this command :

 

qsub -I -l nodes=<free_node>:ppn=2

 

DiyaN_Intel_2-1675927123155.png

3. Please try to run the roofline analysis with the program which I have tried:

 

advixe-cl -collect roofline -project-dir MyResults -- python hello.py

 

DiyaN_Intel_3-1675927285398.png

 

Regards,

Diya

 

 

DiyaN_Intel
Moderator
418 Views

Hi,


Good day to you.


We have not heard back from you. Could you let us know if your issue has been resolved with the above solution?


If this resolves your issue, please make sure to accept this as a solution. This would help others with similar issue. Thank you!


Regards,

Diya



jasonlin316
Beginner
400 Views

 

 

I connect to the node s001-n128.

 

Your code works with the analyzer. However, mine still doesn't work; I guess it is a program-dependent problem, but I still want to know possible ways to fix it. Currently, I still fail to profile on such a simple program:

 

import torch

m = 2048
k = 2048
j = 2048

a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)

 

 

I also ran the self-check.py you previously suggested, but it also fails on the DevCloud s001-n128 node (snapshot attached).

Thanks.

 

DiyaN_Intel
Moderator
354 Views

Hi,

We were able to reproduce the issue from our end. While running your code using the latest version of Intel Advisor 2023.0, we got a crash report when we ran the advisor roofline analysis in Windows 11 Enterprise and it hangs in between if we run it in Linux environment on Intel Developer Cloud.

We are investigating further on this issue at our end and will get back with an update soon.

Here are the below screenshots:

In Windows environment:

DiyaN_Intel_0-1676894641387.png

 

DiyaN_Intel_2-1676894655128.png

 

In Linux environment:

DiyaN_Intel_3-1676894675145.png

 

Regards,

Diya 

 

Reply