AI Tools from Intel
Find answers to your toolkit installation, configuration, and get-started questions.
96 Discussions

Memory issues while implementing Gridsearch-SVC, NuSVC algorithms using Sklearnex

Swatinairl
Beginner
1,711 Views

We are facing memory issues on D4_v5 machine while implementing hyperparameter tuning with  Gridsearch for SVC and NUSVC using Sklearnex for dataset with rows above 400k . Please suggest suitable soln.

0 Kudos
5 Replies
AthiraM_Intel
Moderator
1,688 Views

Hi,


Thank you for posting in Intel Communities.


Could you please share the following details?


  1. Sample reproducer code
  2. Exact steps and the commands used
  3. OS details
  4. Dataset you used



Thanks


0 Kudos
Swatinairl
Beginner
1,653 Views

Hi ,

Ref notebook:Network Intrusion Detection using Python | Kaggle

Below is the code snippet for Grid search where we are facing issues:

 

from sklearn.model_selection import GridSearchCV

train_original = pd.read_csv("data.csv")

train = train_original.head(500000)

# Attack Class Distribution
train['label'].value_counts()

# # SCALING NUMERICAL ATTRIBUTES

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# extract numerical attributes and scale it to have zero mean and unit variance
cols = train.select_dtypes(include=['float64', 'int64']).columns
sc_train = scaler.fit_transform(
train.select_dtypes(include=['float64', 'int64']))
'''sc_test = scaler.fit_transform(
test.select_dtypes(include=['float64', 'int64']))'''

# turn the result back to a dataframe
sc_traindf = pd.DataFrame(sc_train, columns=cols)
#sc_testdf = pd.DataFrame(sc_test, columns=cols)

# # ENCODING CATEGORICAL ATTRIBUTES
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()

# extract categorical attributes from both training and test sets
cattrain = train.select_dtypes(include=['object']).copy()
# encode the categorical attributes
traincat = cattrain.apply(encoder.fit_transform)
# separate target column from encoded data
enctrain = traincat.drop(['label'], axis=1)
cat_Ytrain = traincat[['label']].copy()
train_x = pd.concat([sc_traindf, enctrain], axis=1)
train_y = train['label']
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
train_x, train_y, train_size=0.70, random_state=2)
print("data prep time is ---->", time.time()-start_time_data_prep)

logging.debug("Training with NuSVC")
tuned_parameters = [
{"kernel": ["rbf","poly"], "gamma": ["scale"]}]
score = "recall"
clf = GridSearchCV(NuSVC(nu=0.2), tuned_parameters, n_jobs=-1,
scoring="%s_macro" % score, cv=5, verbose=10)
start_time_nusvc=time.time()
clf.fit(X_train, Y_train)
print("best params",clf.best_params_)
print("best score ",clf.best_score_)

os details:

Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal

The dataset/.csv file is attached here

0 Kudos
AthiraM_Intel
Moderator
1,622 Views

Hi,


We are able to run the sample code you shared without any issues on ubuntu 18(Intel DevCloud).


Could you please try to run the same using Intel DevCloud for oneAPI ?

You can register for DevCloud using the below link:


https://www.intel.com/content/www/us/en/forms/idz/devcloud-enrollment/oneapi-request.html


Meanwhile could you please share the hardware details in which you tried already, so that we can try to reproduce your issue.



Thanks


0 Kudos
AthiraM_Intel
Moderator
1,593 Views

Hi,


We have not heard back from you. Could you please give us an update?



Thanks


0 Kudos
AthiraM_Intel
Moderator
1,562 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.



Thanks


0 Kudos
Reply