Re:Errors when attempting mutli-processing with Itex XPUs.

CMCAlephInnv · ‎07-30-2024

Hi,

I am the CIO at Aleph Innovations, and we work off of Intel's Developer cloud to train and test our novel machine learning models. We work on 16 Intel Datacenter GPU Max 1550s, which have a total FP32 performance of ~1000 TFLOPS. We recently have been trying to get our itex xpus to run in parallel and distribute the load from one trail in our Optuna model training function and get quick results. We have had the following errors and would like to solve this problem as it has been going on for a while and bottlenecking our whole product development pipeline. From what I can tell, we initialize the gpus and get to memory growth for each of the. Then nothing happens after this point. I have plenty of telemetry in my code that should be showing if it were running. Attached below are some screenshots of where the code stops and an error that comes up in our initial sdp platform for connecting to the intel cloud. I am also going to include some code snippets. Please help if possible. If you think you can assist but require further information, please let me know and I will be happy to provide whatever necessary.

Code Snippets:

## Dependencies, initializing xpus, and defining our xpu strategy

import os
import logging

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["ITEX_TILE_AS_DEVICE"] = "0"
os.environ["ITEX_OMP_THREADPOOL"] = "0"
os.environ["ZE_FLAT_DEVICE_HIERARCHY"] = "FLAT"
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "1"
os.environ["ITEX_FP32_MATH_MODE"] = "BF32"
# os.environ["ITEX_VERBOSE"] = "2" #more intel telemetry

tf_logger = logging.getLogger("tensorflow")
tf_logger.setLevel(logging.ERROR)

import keras
import tensorflow as tf
import pandas as pd
import re
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
BatchNormalization,
Conv2D,
Flatten,
Dense,
Activation,
)
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam
import sklearn
import sys
from sklearn.preprocessing import MultiLabelBinarizer
import optuna
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import layers
import intel_extension_for_tensorflow as itex
from optuna.pruners import HyperbandPruner
from optuna.samplers import TPESampler
from optuna.integration import DaskStorage
from dask.distributed import Client, LocalCluster
import dask.array as da
import dask.dataframe as dd
import tqdm
import pandas as pd

tf.get_logger().setLevel("ERROR")
tf.config.threading.set_intra_op_parallelism_threads(96)
tf.config.threading.set_inter_op_parallelism_threads(8)

# Set up logging
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)

# Configure TensorFlow to use Intel GPUs
physical_devices = tf.config.list_physical_devices("XPU")

num_XPUs = len(physical_devices)
print(f"Avilable xpu devices : {num_XPUs}")

# attempt to reduce memory pressure (unrahul)
for xpu in physical_devices:
try:
tf.config.experimental.set_memory_growth(xpu, True)
print(f"Memory growth set for XPU: {xpu}")
except RuntimeError as e:
print(e)

if num_XPUs > 1:
strategy = tf.distribute.MirroredStrategy(
devices=[f"/XPU:{i}" for i in range(num_XPUs)]
) # rahul: nitfix
logger.info(f"Using MirroredStrategy with {num_XPUs} XPUs") # rahul nit fix
else:
strategy = tf.distribute.OneDeviceStrategy(device="/XPU:0")
logger.info("Using OneDeviceStrategy with XPU:0")

## using our strategy for computation speed-up

with strategy.scope():
## whatever needs a speed-up

Vipin_S_Intel · ‎07-31-2024

Hi Christopher, we would like to inform you that we are routing your query to the dedicated team for further assistance

Louie_T_Intel · ‎08-14-2024

We has identified that issue should be caused by pandas and it is not from ITEX or TensorFlow.

Intel Python experts will help to look into this pandas OOM issue on PVC.

Louie_T_Intel · ‎08-14-2024

We has identified that issue should be caused by pandas and it is not from ITEX or TensorFlow.

Intel Python experts will help to look into this pandas OOM issue on PVC.

Aditya18 · ‎09-11-2024

Hi, as discussed on Slack, let me update here as well.

The following error is due to missing metadata in the dask df transformation using lambda function.ValueError('Metadata inference failed in

lambda

.\n\nYou have supplied a custom function and Dask is unable to \ndetermine the type of output that that function returns. \n\n

To resolve this please provide a meta= keyword.\nThe docstring of the Dask function you ran should have more information.\n\nOriginal error is below:\n------------------------\n

TypeError("'NAType' object is not iterable")

If we add metadata like below... It doesn't throw the error...

meta = ('SubBoard', 'object')
df["SubBoard"] = df["SubBoard"].map_partitions(
lambda part: part.apply(
lambda x: [re.sub(r"[\n\t\r\v]", "", value) if pd.notna(value) else value for value in x]
),meta=meta
)

Aditya18 · ‎09-11-2024

Based on the previous comments, I can infer that before converting the numpy array to .npz, you are trying to do preprocessing using DafaFrame.

We don't have support for Pandas/Dask (for data preprocessing) on Intel GPU.

Though it can run very well on CPUs with Modin (Ray or Dask backend). https://modin.readthedocs.io/en/stable/getting_started/using_modin/using_modin_cluster.html

https://arunjose696.github.io/modin_perf_examples/gh_page_4.html

This content is a preview of a link.

modin.readthedocs.io

modin.readthedocs.io

https://modin.readthedocs.io/en/stable/getting_started/using_modin/using_modin_cluster.html

Aditya18 · ‎09-11-2024

Hi,

This particular Dask issue has been resolved after using the meta with Dask dataframe operation.

Pandas don't support Intel GPU. However, it can be optimized on a Multi-node CPU cluster using Modin.

Closing the ticket. Feel free to raise a separate ticket for further issues.