Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
Hi,
I am the CIO at Aleph Innovations, and we work off of Intel's Developer cloud to train and test our novel machine learning models. We work on 16 Intel Datacenter GPU Max 1550s, which have a total FP32 performance of ~1000 TFLOPS. We recently have been trying to get our itex xpus to run in parallel and distribute the load from one trail in our Optuna model training function and get quick results. We have had the following errors and would like to solve this problem as it has been going on for a while and bottlenecking our whole product development pipeline. From what I can tell, we initialize the gpus and get to memory growth for each of the. Then nothing happens after this point. I have plenty of telemetry in my code that should be showing if it were running. Attached below are some screenshots of where the code stops and an error that comes up in our initial sdp platform for connecting to the intel cloud. I am also going to include some code snippets. Please help if possible. If you think you can assist but require further information, please let me know and I will be happy to provide whatever necessary.
Code Snippets:
## Dependencies, initializing xpus, and defining our xpu strategy
import os
import logging
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["ITEX_TILE_AS_DEVICE"] = "0"
os.environ["ITEX_OMP_THREADPOOL"] = "0"
os.environ["ZE_FLAT_DEVICE_HIERARCHY"] = "FLAT"
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "1"
os.environ["ITEX_FP32_MATH_MODE"] = "BF32"
# os.environ["ITEX_VERBOSE"] = "2" #more intel telemetry
tf_logger = logging.getLogger("tensorflow")
tf_logger.setLevel(logging.ERROR)
import keras
import tensorflow as tf
import pandas as pd
import re
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
BatchNormalization,
Conv2D,
Flatten,
Dense,
Activation,
)
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam
import sklearn
import sys
from sklearn.preprocessing import MultiLabelBinarizer
import optuna
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import layers
import intel_extension_for_tensorflow as itex
from optuna.pruners import HyperbandPruner
from optuna.samplers import TPESampler
from optuna.integration import DaskStorage
from dask.distributed import Client, LocalCluster
import dask.array as da
import dask.dataframe as dd
import tqdm
import pandas as pd
tf.get_logger().setLevel("ERROR")
tf.config.threading.set_intra_op_parallelism_threads(96)
tf.config.threading.set_inter_op_parallelism_threads(8)
# Set up logging
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)
# Configure TensorFlow to use Intel GPUs
physical_devices = tf.config.list_physical_devices("XPU")
num_XPUs = len(physical_devices)
print(f"Avilable xpu devices : {num_XPUs}")
# attempt to reduce memory pressure (unrahul)
for xpu in physical_devices:
try:
tf.config.experimental.set_memory_growth(xpu, True)
print(f"Memory growth set for XPU: {xpu}")
except RuntimeError as e:
print(e)
if num_XPUs > 1:
strategy = tf.distribute.MirroredStrategy(
devices=[f"/XPU:{i}" for i in range(num_XPUs)]
) # rahul: nitfix
logger.info(f"Using MirroredStrategy with {num_XPUs} XPUs") # rahul nit fix
else:
strategy = tf.distribute.OneDeviceStrategy(device="/XPU:0")
logger.info("Using OneDeviceStrategy with XPU:0")
## using our strategy for computation speed-up
with strategy.scope():
## whatever needs a speed-up
Link Copied
Hi Christopher, we would like to inform you that we are routing your query to the dedicated team for further assistance
We has identified that issue should be caused by pandas and it is not from ITEX or TensorFlow.
Intel Python experts will help to look into this pandas OOM issue on PVC.
We has identified that issue should be caused by pandas and it is not from ITEX or TensorFlow.
Intel Python experts will help to look into this pandas OOM issue on PVC.
Hi, as discussed on Slack, let me update here as well.
The following error is due to missing metadata in the dask df transformation using lambda function.ValueError('Metadata inference failed in
lambda
.\n\nYou have supplied a custom function and Dask is unable to \ndetermine the type of output that that function returns. \n\n
To resolve this please provide a meta= keyword.\nThe docstring of the Dask function you ran should have more information.\n\nOriginal error is below:\n------------------------\n
TypeError("'NAType' object is not iterable")
If we add metadata like below... It doesn't throw the error...
Based on the previous comments, I can infer that before converting the numpy array to .npz, you are trying to do preprocessing using DafaFrame.
We don't have support for Pandas/Dask (for data preprocessing) on Intel GPU.
Though it can run very well on CPUs with Modin (Ray or Dask backend). https://modin.readthedocs.io/en/stable/getting_started/using_modin/using_modin_cluster.html
https://arunjose696.github.io/modin_perf_examples/gh_page_4.html
This content is a preview of a link.
modin.readthedocs.io
https://modin.readthedocs.io/en/stable/getting_started/using_modin/using_modin_cluster.html
Hi,
This particular Dask issue has been resolved after using the meta with Dask dataframe operation.
Pandas don't support Intel GPU. However, it can be optimized on a Multi-node CPU cluster using Modin.
Closing the ticket. Feel free to raise a separate ticket for further issues.
Community support is provided Monday to Friday. Other contact methods are available here.
Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
For more complete information about compiler optimizations, see our Optimization Notice.