Solved: Issue with Intel Developer Cloud Environment Pytorch GPU

Nitin_Mane · ‎02-13-2024

I'm facing an issue with the IDC Environment as the Pytorch is not able to configure with the XPU library
# Import intel_extension_for_pytorch
import intel_extension_for_pytorch as ipex

The following issue as follows:

Error Details:-

ImportError: /opt/intel/oneapi/intelpython/latest/envs/pytorch-gpu/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

Can you provide a method to reset the environment without affecting the cloud credit?

Faiz_Intel · ‎02-13-2024

Hi Nitin_Mane,

Thank you for reaching out to us.

For your information, all JupyterLab trainings under the Training and Workshops section in the Intel® Developer Cloud Console are free to use, and no payment or cloud credit is required.

To reset the environment, right-click on the file (e.g., image_to_image.ipynb) and select Delete. Next, click on the File tab at the top, and select Log Out. Finally, log back in, and the image_to_image.ipynb file should be restored to its default state.

Regards,

Faiz

View solution in original post

Faiz_Intel · ‎02-13-2024

Hi Nitin_Mane,

Thank you for reaching out to us.

For your information, all JupyterLab trainings under the Training and Workshops section in the Intel® Developer Cloud Console are free to use, and no payment or cloud credit is required.

To reset the environment, right-click on the file (e.g., image_to_image.ipynb) and select Delete. Next, click on the File tab at the top, and select Log Out. Finally, log back in, and the image_to_image.ipynb file should be restored to its default state.

Regards,

Faiz

Nitin_Mane · ‎02-14-2024

Thank you for the response. As i have done the step of deleting files and re-login of the jupyterlab it helped me to restore the file.
As per the observation in my account, I see no slot for GPU device selected in the environment. This I've encountered from 9th Feb 2024 onwards as i can able to work on Xeon CPU but not able to work on the pytorch IPEX or any generative AI notebook.

Can you provide some method to make this file run on the Intel Developer Cloud ( https://console.cloud.intel.com/)

Thank you

----------------------------------------------------------------------------------------------------------------------------------------------------

In the following code
!echo "List of Intel GPUs available on the system:"
!xpu-smi discovery 2> /dev/null

I can only able to get this code for half an hour and freeze the chrome for some time.

>>

List of Intel GPUs available on the system:

This usually happens several times in the meantime I have also discussed this issue with the other developers under the Intel community their notebooks can run GenAI code in full potential.

This is an important part for me as I will be further taking sessions in the upcoming weeks. I request to take this into a priority issue.

Faiz_Intel · ‎02-14-2024

Hi Nitin_Mane,

We apologize for the inconvenience. Our team is conducting further investigation into this matter and will provide you with an update soon. We greatly appreciate your patience.

In the meantime, could you please share your system ID? Click on the File tab, select Hub Control Panel, and your system ID will be shown at the top right of the window. You can also get the system ID from the training URL (.../user/[system ID]/lab/...).

Regards,

Faiz

Nitin_Mane · ‎02-14-2024

Sure, please find the following details

System ID - udc8678212e965c902b7658a65596387
link - https://idcbetabatch.eglb.intel.com/user/udc8678212e965c902b7658a65596387/lab/workspaces/auto-I/tree/Training/AI/GenAI/text_to_image.ipynb

Location - us-region-1
Tier - Standard

Thank you

Faiz_Intel · ‎02-14-2024

Hi Nitin_Mane,

Thank you for your patience. I have reproduced the issue on my end, experiencing the same prolonged code execution (more than 5 hours in total) without success and also noting that the GPU is not detected when running xpu-smi discovery.

Fortunately, the issue was resolved by restarting the server. The codes now execute within a few seconds, allowing me to generate the images.

I kindly request you to attempt a resolution by restarting the server. This can be done by navigating to the File tab, selecting Hub Control Panel, clicking on Stop My Server, and subsequently Start My Server. Please let me know if the issue persist.

Regards,

Faiz

Nitin_Mane · ‎02-16-2024

Hello Faiz,

Thank you for sharing the details.

As I can see all the libraries are working fine in your environment. I have been testing for over 10 hours with the rest files and doing the same process. I still have not had success in running any GenAI project file on Intel Developer Cloud.

I request you please take this issue and guide me on the process of how I can resolve this with the best method.

I can now access the GPU which in the previous I was able to provide the details

List of Intel GPUs available on the system:
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-0029-0000-002f0bda8086                                           |
|           | PCI BDF Address: 0000:29:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 1         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-003a-0000-002f0bda8086                                           |
|           | PCI BDF Address: 0000:3a:00.0                                                        |
|           | DRM Device: /dev/dri/card2                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 2         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-009a-0000-002f0bda8086                                           |
|           | PCI BDF Address: 0000:9a:00.0                                                        |
|           | DRM Device: /dev/dri/card3                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 3         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-00ca-0000-002f0bda8086                                           |
|           | PCI BDF Address: 0000:ca:00.0                                                        |
|           | DRM Device: /dev/dri/card4                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
Intel Xeon CPU used by this notebook:
Model name:                         Intel(R) Xeon(R) Platinum 8480+

Apart from this I still face the same issue on the Pytorch GPU environment
Please check the screenshot.

Issue -

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[4], line 28
     24 logging.getLogger("bigdl").setLevel(logging.ERROR)
     27 import torch
---> 28 import intel_extension_for_pytorch as ipex
     29 from datasets import load_dataset
     30 from datasets import Dataset

File ~/.local/lib/python3.9/site-packages/intel_extension_for_pytorch/__init__.py:94
     90                 raise err
     92     kernel32.SetErrorMode(prev_error_mode)
---> 94 from .utils._proxy_module import *
     95 from .utils.utils import has_cpu, has_xpu
     97 if has_cpu():

File ~/.local/lib/python3.9/site-packages/intel_extension_for_pytorch/utils/_proxy_module.py:2
      1 import torch
----> 2 import intel_extension_for_pytorch._C
      5 # utils function to define base object proxy
      6 def _proxy_module(name: str) -> type:

ImportError: libmkl_sycl_blas.so.4: cannot open shared object file: No such file or directory

Faiz_Intel · ‎02-18-2024

Hi Nitin_Mane,

We've informed the appropriate team for further investigation of this matter and will provide you with an update soon.

In the meantime, do we have your approval for the IDC Support team to access your system if needed? Also, is there any important data stored in your system?

Regards,

Faiz

Nitin_Mane · ‎02-18-2024

Hello Faiz,

Thank you for the information.

I approve to access my IDC account and relevant files stored in the system.

Cheers,
Nitin Mane

Faiz_Intel · ‎02-18-2024

Hi Nitin_Mane,

Thank you for your approval. We will keep you updated once we receive feedback from the relevant team. We greatly appreciate your patience.

Regards,

Faiz

Faiz_Intel · ‎05-29-2024

This thread will no longer be monitored since we have provided a solution via email. If you need any additional information from Intel, please submit a new question.

Issue with Intel Developer Cloud Environment Pytorch GPU

Intel Developer Cloud

PyTorch