- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to run roofline analysis using Intel Advisor on DevCloud.
Background:
I am using the entry node, which has a Intel(R) Xeon(R) Gold 6128 CPU.
I source the setup shell from /opt/intel/inteloneapi/setvars.sh
Commands I used:
advixe-cl -collect survey -project-dir MyResults -- python sage.py (this seems fine)
advixe-cl -collect tripcounts -flop -project-dir MyResults -- python sage.py
advixe-cl -collect roofline -project-dir MyResults -- python sage.py
the last two commands report the following errors:
(tripcount collect error)
advisor: Opening result 75 % Loading 'tripcounts_2265789_0.tcs' file 
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs001/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).
(roofline collect error)
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/advixe-runtrc.txt' ().
advisor: Error: Cannot load data file `/home/u177364/oneAPI_2023/Test/MyResults/e000/hs002/data.1/tripcounts_2264713_0.tcs' (Data file is corrupted).
I tried to generate the report anyway with the command:
advixe-cl --report=roofline --project-dir=./MyResults --report-output=./roofline.html
but it shows that Analyzer cannot find the tripcount -flop data.
I attached the tripcount folder with some log files and trc files, hope these help, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We were able to reproduce the issue from our end. While running your code using the latest version of Intel Advisor 2023.0, we got a crash report when we ran the advisor roofline analysis in Windows 11 Enterprise and it hangs in between if we run it in Linux environment on Intel Developer Cloud.
We are investigating further on this issue at our end and will get back with an update soon.
Here are the below screenshots:
In Windows environment:
In Linux environment:
Regards,
Diya
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Please try to run the python code which you have shared without Intel Advisor and check whether it is working fine or not as I can see in the error it is showing that the data file is corrupted.
Please share the sample reproducer code so that we can further debug your issue.
We tried with a sample python code in the nodes s001-n007 and s001-n088 with Intel(R) Xeon(R) Gold 6128 CPU with the same command you tried in DevCloud and it was running successfully.
Sharing the screenshots of the same.
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! I have run the program without Advisor and it was completed successfully.
I have attached a zip folder with the followings:
- reproduce.txt: how I set up my conda environment
- sage.py: the program I ran
- output.txt: what shows on my terminal after I ran the program
Hope these help, thank you.
Also, is there a way or a website showing which node (e.g., s001-n007) has what resources (CPU, GPU, etc.)? Currently, the DevCloud document tells us to enter a Linux command to list the available resources (properties), but I do not know the corresponding node number. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have tried to reproduce the issue and it looks like the issue is with your code.
In order to verify the same, could you try in running on some other machine other than DevCloud and see if you are able to generate the results and get back to us?
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried running advisor on my machine with an Intel(R) Xeon(R) Platinum 8368 CPU. I still run into the same problem.
I also tried running another python program:
import torch
m = 2048
k = 2048
j = 2048
a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)but I still get the same errors. I have attached the complete terminal output for this simple python program, hopefully that helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We tried to reproduce the issue from our end and the same code was running properly for us.
There might be one of the possible reasons why it was not working probably the environment variables are not set.
Please run the below command to source the environment variables(run as a root/superuser):
source /opt/intel/oneapi/setvars.sh
Please run the below command and attach the logs :(run as a root/superuser):
python3 /opt/intel/oneapi/advisor/latest/bin64/adv_self_check.py
Please provide logs or screenshots of the exact error so that we can reproduce your issue from our end.
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have run the self-check and it reports some error (part1 of figure); note: I also ran with "python3" and get the same result
I also try sourcing the setup.sh and run self-check with sudo, but it reports some error as well (part 2 of figure).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we can see from the screenshot, the self-checker is failing, could you please try reinstalling the latest Intel Advisor toolkit or Intel oneAPI base toolkit in your system.
And on Intel Developer Cloud can you please let us know which node you were trying to access to understand more about the issue .
Please share the operating system details as well.
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please follow these steps and check if it working fine for you or not, I have attached the program in which I am trying :
1. Please give this command to connect to DevCloud which has an Intel(R) Xeon(R) Gold 6128 CPU
pbsnodes | grep -i "skl\|gold6128" -B4
2. Then try to access the nodes whose "state = free" and "power_state=Running" by giving this command :
qsub -I -l nodes=<free_node>:ppn=2
3. Please try to run the roofline analysis with the program which I have tried:
advixe-cl -collect roofline -project-dir MyResults -- python hello.py
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Good day to you.
We have not heard back from you. Could you let us know if your issue has been resolved with the above solution?
If this resolves your issue, please make sure to accept this as a solution. This would help others with similar issue. Thank you!
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I connect to the node s001-n128.
Your code works with the analyzer. However, mine still doesn't work; I guess it is a program-dependent problem, but I still want to know possible ways to fix it. Currently, I still fail to profile on such a simple program:
import torch
m = 2048
k = 2048
j = 2048
a = torch.rand(m, k)
b = torch.rand(k, j)
c = torch.matmul(a, b)
I also ran the self-check.py you previously suggested, but it also fails on the DevCloud s001-n128 node (snapshot attached).
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We were able to reproduce the issue from our end. While running your code using the latest version of Intel Advisor 2023.0, we got a crash report when we ran the advisor roofline analysis in Windows 11 Enterprise and it hangs in between if we run it in Linux environment on Intel Developer Cloud.
We are investigating further on this issue at our end and will get back with an update soon.
Here are the below screenshots:
In Windows environment:
In Linux environment:
Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Good day to you.
The issue that we are facing in the latest version of Intel Advisor 2023.0 ,will be fixed in the future release of Intel Advisor 2023.2.
Sorry for the inconvenience caused.
Can you please confirm whether we can go forward and close this case?
If this resolves your issue, make sure to accept this as a solution.
This would help others with similar issues. Thank you!
Thanks and Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you.
Is your issue resolved with the above solution?
If this resolves your issue, make sure to accept this as a solution.
This would help others with similar issues. Thank you!
Thanks and Regards,
Diya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks and Regards,
Diya
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page