I am trying to run roofline analysis using Intel Advisor on DevCloud.
I am using the entry node, which has a Intel(R) Xeon(R) Gold 6128 CPU.
I source the setup shell from /opt/intel/inteloneapi/setvars.sh
Commands I used:
advixe-cl -collect survey -project-dir MyResults -- python sage.py (this seems fine)
advixe-cl -collect tripcounts -flop -project-dir MyResults -- python sage.py
advixe-cl -collect roofline -project-dir MyResults -- python sage.py
the last two commands report the following errors:
(tripcount collect error)
advisor: Opening result 75 % Loading 'tripcounts_2265789_0.tcs' file
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs001/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).
(roofline collect error)
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/advixe-runtrc.txt' ().
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).
I tried to generate the report anyway with the command:
advixe-cl --report=roofline --project-dir=./MyResults --report-output=./roofline.html
but it shows that Analyzer cannot find the tripcount -flop data.
I attached the tripcount folder with some log files and trc files, hope these help, thanks.
Thank you for posting in Intel Communities.
Please try to run the python code which you have shared without Intel Advisor and check whether it is working fine or not as I can see in the error it is showing that the data file is corrupted.
Please share the sample reproducer code so that we can further debug your issue.
We tried with a sample python code in the nodes s001-n007 and s001-n088 with Intel(R) Xeon(R) Gold 6128 CPU with the same command you tried in DevCloud and it was running successfully.
Sharing the screenshots of the same.
Thanks! I have run the program without Advisor and it was completed successfully.
I have attached a zip folder with the followings:
- reproduce.txt: how I set up my conda environment
- sage.py: the program I ran
- output.txt: what shows on my terminal after I ran the program
Hope these help, thank you.
Also, is there a way or a website showing which node (e.g., s001-n007) has what resources (CPU, GPU, etc.)? Currently, the DevCloud document tells us to enter a Linux command to list the available resources (properties), but I do not know the corresponding node number. Thanks.
We have tried to reproduce the issue and it looks like the issue is with your code.
In order to verify the same, could you try in running on some other machine other than DevCloud and see if you are able to generate the results and get back to us?
I tried running advisor on my machine with an Intel(R) Xeon(R) Platinum 8368 CPU. I still run into the same problem.
I also tried running another python program:
import torch m = 2048 k = 2048 j = 2048 a = torch.rand(m, k) b = torch.rand(k, j) c = torch.matmul(a, b)
but I still get the same errors. I have attached the complete terminal output for this simple python program, hopefully that helps.
We tried to reproduce the issue from our end and the same code was running properly for us.
There might be one of the possible reasons why it was not working probably the environment variables are not set.
Please run the below command to source the environment variables(run as a root/superuser):
Please run the below command and attach the logs :(run as a root/superuser):
Please provide logs or screenshots of the exact error so that we can reproduce your issue from our end.
I have run the self-check and it reports some error (part1 of figure); note: I also ran with "python3" and get the same result
I also try sourcing the setup.sh and run self-check with sudo, but it reports some error as well (part 2 of figure).
As we can see from the screenshot, the self-checker is failing, could you please try reinstalling the latest Intel Advisor toolkit or Intel oneAPI base toolkit in your system.
And on Intel Developer Cloud can you please let us know which node you were trying to access to understand more about the issue .
Please share the operating system details as well.