Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
1218 Discussions

Fail to run Intel Advisor Roofline Analysis on DevCloud

jasonlin316
Beginner
469 Views

I am trying to run roofline analysis using Intel Advisor on DevCloud.

 

Background:

I am using the entry node, which has a  Intel(R) Xeon(R) Gold 6128 CPU.

I source the setup shell from /opt/intel/inteloneapi/setvars.sh

Commands I used:

advixe-cl -collect survey -project-dir MyResults -- python sage.py (this seems fine)

advixe-cl -collect tripcounts -flop -project-dir MyResults -- python sage.py

advixe-cl -collect roofline -project-dir MyResults -- python sage.py

 

the last two commands report the following errors:

(tripcount collect error)

advisor: Opening result 75 % Loading 'tripcounts_2265789_0.tcs' file
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs001/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

(roofline collect error)

advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/advixe-runtrc.txt' ().
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).

 

I tried to generate the report anyway with the command:

advixe-cl --report=roofline --project-dir=./MyResults --report-output=./roofline.html

but it shows that Analyzer cannot find the tripcount -flop data.

 

I attached the tripcount folder with some log files and trc files, hope these help, thanks.

 

 

 

 

0 Kudos
8 Replies
DiyaN_Intel
Moderator
443 Views

Hi,

 

Thank you for posting in Intel Communities.

 

Please try to run the python code which you have shared without Intel Advisor and check whether it is working fine or not as I can see in the error it is showing that the data file is corrupted.

 

Please share the sample reproducer code so that we can further debug your issue.

 

We tried with a sample python code in the nodes s001-n007 and s001-n088 with Intel(R) Xeon(R) Gold 6128 CPU with the same command you tried in DevCloud and it was running successfully.

 

Sharing the screenshots of the same.

advi.png

advi1.png

Regards,

Diya 

 

jasonlin316
Beginner
320 Views

Thanks! I have run the program without Advisor and it was completed successfully.

I have attached a zip folder with the followings:

  1. reproduce.txt: how I set up my conda environment
  2. sage.py: the program I ran
  3. output.txt: what shows on my terminal after I ran the program

Hope these help, thank you.

 

Also, is there a way or a website showing which node (e.g., s001-n007) has what resources (CPU, GPU, etc.)? Currently, the DevCloud document tells us to enter a Linux command to list the available resources (properties), but I do not know the corresponding node number. Thanks.

DiyaN_Intel
Moderator
281 Views

Hi,


We have tried to reproduce the issue and it looks like the issue is with your code. 

In order to verify the same, could you try in running on some other machine other than DevCloud and see if you are able to generate the results and get back to us?


Regards,

Diya



jasonlin316
Beginner
124 Views

Hi,

 

I tried running advisor on my machine with an Intel(R) Xeon(R) Platinum 8368 CPU. I still run into the same problem.

 

I also tried running another python program:

import torch

m = 2048
k = 2048
j = 2048

a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)

 but I still get the same errors. I have attached the complete terminal output for this simple python program, hopefully that helps.

 

 

DiyaN_Intel
Moderator
115 Views

Hi, 

 

We tried to reproduce the issue from our end and the same code was running properly for us.

DiyaN_Intel_0-1675410279081.png

DiyaN_Intel_3-1675410386951.png

There might be one of the possible reasons why it was not working probably the environment variables are not set.

Please run the below command to source the environment variables(run as a root/superuser):

 

source /opt/intel/oneapi/setvars.sh

 

Please run the below command and attach the logs :(run as a root/superuser):

 

python3 /opt/intel/oneapi/advisor/latest/bin64/adv_self_check.py

 

Please provide logs or screenshots of the exact error so that we can reproduce your issue from our end.

 

Regards, 

Diya

 

jasonlin316
Beginner
99 Views

Hi,

 

I have run the self-check and it reports some error (part1 of figure); note: I also ran with "python3" and get the same result

error.png

 

I also try sourcing the setup.sh and run self-check with sudo, but it reports some error as well (part 2 of figure).

DiyaN_Intel
Moderator
39 Views

Hi,

 

As we can see from the screenshot, the self-checker is failing, could you please try reinstalling the latest Intel Advisor toolkit or Intel oneAPI base toolkit in your system.

 

And on Intel Developer Cloud can you please let us know which node you were trying to access to understand more about the issue .

 

Please share the operating system details as well.

 

Regards,

Diya

 

 

jasonlin316
Beginner
19 Views

Hi,

 

I believe I am already using the latest Advisor as I just installed it within two weeks.

I attached a screenshot including my Advisor version and OS details.

 

For Intel DevCloud, I am using the entry node, which has an Intel(R) Xeon(R) Gold 6128 CPU.

 

 

Reply