- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are facing memory issues on D4_v5 machine while implementing hyperparameter tuning with Gridsearch for SVC and NUSVC using Sklearnex for dataset with rows above 400k . Please suggest suitable soln.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Could you please share the following details?
- Sample reproducer code
- Exact steps and the commands used
- OS details
- Dataset you used
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
Ref notebook:Network Intrusion Detection using Python | Kaggle
Below is the code snippet for Grid search where we are facing issues:
from sklearn.model_selection import GridSearchCV
train_original = pd.read_csv("data.csv")
train = train_original.head(500000)
# Attack Class Distribution
train['label'].value_counts()
# # SCALING NUMERICAL ATTRIBUTES
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# extract numerical attributes and scale it to have zero mean and unit variance
cols = train.select_dtypes(include=['float64', 'int64']).columns
sc_train = scaler.fit_transform(
train.select_dtypes(include=['float64', 'int64']))
'''sc_test = scaler.fit_transform(
test.select_dtypes(include=['float64', 'int64']))'''
# turn the result back to a dataframe
sc_traindf = pd.DataFrame(sc_train, columns=cols)
#sc_testdf = pd.DataFrame(sc_test, columns=cols)
# # ENCODING CATEGORICAL ATTRIBUTES
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
# extract categorical attributes from both training and test sets
cattrain = train.select_dtypes(include=['object']).copy()
# encode the categorical attributes
traincat = cattrain.apply(encoder.fit_transform)
# separate target column from encoded data
enctrain = traincat.drop(['label'], axis=1)
cat_Ytrain = traincat[['label']].copy()
train_x = pd.concat([sc_traindf, enctrain], axis=1)
train_y = train['label']
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
train_x, train_y, train_size=0.70, random_state=2)
print("data prep time is ---->", time.time()-start_time_data_prep)
logging.debug("Training with NuSVC")
tuned_parameters = [
{"kernel": ["rbf","poly"], "gamma": ["scale"]}]
score = "recall"
clf = GridSearchCV(NuSVC(nu=0.2), tuned_parameters, n_jobs=-1,
scoring="%s_macro" % score, cv=5, verbose=10)
start_time_nusvc=time.time()
clf.fit(X_train, Y_train)
print("best params",clf.best_params_)
print("best score ",clf.best_score_)
os details:
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
The dataset/.csv file is attached here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are able to run the sample code you shared without any issues on ubuntu 18(Intel DevCloud).
Could you please try to run the same using Intel DevCloud for oneAPI ?
You can register for DevCloud using the below link:
https://www.intel.com/content/www/us/en/forms/idz/devcloud-enrollment/oneapi-request.html
Meanwhile could you please share the hardware details in which you tried already, so that we can try to reproduce your issue.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please give us an update?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page