I'm doing CV for a kNN classifier. I load data into a DataFrame, use KFold to split it, and run the classifier on the split data.
The Intel distribution gives me a huge speedup on the kNN classifier, but I quickly run out of memory. If I use a vanilla distribution, it works fine.
Here is a small test case to reproduce the problem:
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import KFold
import pandas as pd
X, y = make_classification(n_samples=14000, n_features=120, n_informative=20,n_classes=2)
train_data = pd.DataFrame.from_records(X)
train_targets = pd.Series(y)
for i in range(300):
kf = KFold(n_splits=3)
for train_index, test_index in kf.split(train_data):
X_train, X_test = train_data.iloc[train_index], train_data.iloc[test_index]
y_train, y_test = train_targets.iloc[train_index], train_targets.iloc[test_index]
knn = KNeighborsClassifier(n_neighbors=120)
After 300 iterations, this code will use around 8Gb of memory, but only 120Mb with the first two lines removed.
I'm running on Win 10, with the following conda environment:
bzip2 1.0.8 vc14h5832a3a_5 [vc14] intel
certifi 2020.6.20 py37hefe589e_1 intel
common_cmplr_lib_rt 2021.1.1 intel_191 intel
common_cmplr_lic_rt 2021.1.1 intel_191 intel
daal4py 2021.1 py37hf1d83b1_3 intel
dal 2021.1.1 intel_71 intel
dpcpp_cpp_rt 2021.1.1 intel_191 intel
fortran_rt 2021.1.1 intel_191 intel
icc_rt 2021.1.1 intel_191 intel
impi_rt 2021.1.1 intel_88 intel
intel-openmp 2021.1.1 intel_191 intel
intelpython 2021.1.1 1 intel
intelpython3_core 2021.1.1 0 intel
joblib 0.17.0 py37hbce671c_0 intel
mkl 2021.1.1 intel_52 intel
mkl-service 2.3.0 py37h939bbf7_6 intel
mkl_fft 1.2.0 py37h7c155fc_4 intel
mkl_random 1.2.0 py37h39757d5_4 intel
mkl_umath 0.1.0 py37h50e0a19_0 intel
numpy 1.19.2 py37h02626c5_0 intel
numpy-base 1.19.2 py37h141cca1_0 intel
opencl_rt 2021.1.1 intel_191 intel
openssl 1.1.1h vc14he774522_0 [vc14] intel
pandas 1.1.2 py37h19d3ef7_0 intel
pip 20.2.3 py37h56aae7b_1 intel
python 3.7.9 h64ef1ba_1 intel
python-dateutil 2.8.1 py37hd8ca5e9_2 intel
pytz 2020.1 py37h0699639_1 intel
pyyaml 5.3.1 py37h1acd8f6_0 intel
scikit-learn 0.23.2 py37hf1917a8_5 intel
scipy 1.5.2 py37h002189b_0 intel
setuptools 50.3.2 py37h4200cf5_0 intel
six 1.15.0 py37h65307dc_1 intel
sqlite 3.33.0 vc14h5832a3a_1 [vc14] intel
tbb 2021.1.1 vc14_intel_133 [vc14] intel
tbb4py 2021.1.1 py37_intel_133 [vc14] intel
tcl 8.6.9 vc14he774522_27 [vc14] intel
threadpoolctl 2.1.0 py37h6447541_2 intel
tk 8.6.9 vc14h57a849e_8 [vc14] intel
vc 14.1 h869be7e_15 intel
vs2015_runtime 14.16.27012 hf0eaf9b_15 intel
wheel 0.35.1 py37h4a4c509_1 intel
wincertstore 0.2 py37_4 intel
xz 5.2.5 hea85519_2 intel
yaml 0.1.7 hd09c893_7 intel
zlib 22.214.171.124 vc14ha0a531f_3 [vc14] intel
Any help much appreciated.
Thanks for reaching out to us.
We tried running the reproducer code given in an environment where Intel python is installed through conda. We have tried changing the no of iterations and ran up to 1000 iterations. Also, We checked in two different versions of Intel Python v2021.1.1 & v2019.5. But, we couldn't observe any such out of memory issue. So, it would be better if you give few more details like
1. Can you confirm on the Intel Python installation, is it through conda package or through stand alone installer?
2. Did you observe same issue with any other samples?
sorry for the late reply - I was away for New Year.
It's a Conda install:
conda config --add channels intel
conda create -n idp-test2 intelpython3_core python=3.7
conda activate idp-test2
conda install scikit-learn
conda install pandas
I have tried installing the full version as well, but got the same result.
I don't have another sample, but I will try to find a work-around, and perhaps that will create another sample.
It is somehow a combination of KFold and kNN. If I change the last line to knn.fit(train_data, train_targets), ignoring the KFold data, then it works as expected. And if I comment out the kNN fit, but keep the KFold, then it works as well.
Thanks for the response. We have checked the memory consumption for the sample use case that you given. As you said, we have observed that running the sample in IntelPython takes around 8GB(with K-fold cross validation, where k=3) whereas in Normal Python it took only 119-130Mb. So, we are forwarding this case to Subject Matter Experts to check the performance issue. They will get back to you soon.
The problem consist of two parts:
1) not best choice of DataFrame constructor 'from_records' which transform passed ndarray in specific way and led to
2) uncorrect work of daal4py internal data conversion resulting in memory leak
The fix is simple, replace
Memory profiler showed that memory consumption went to normal with fix.
Let me know if problem will not disappear.
yes, I can confirm that using the default constructor fixes the problem in the sample.
In my real program, I load the data from a csv file. Something like:
train = pd.read_csv('myfile.csv')
train_targets = train['Target']
train_data = train[features]
This code, using read_csv(), also triggers a memory leak.
I tried to load the csv with numpy, and pass the ndarray to pandas like this:
tmp = genfromtxt('myfile.csv')
train = pd.DataFrame(tmp)
but that also triggers a memory leak.
Do you see a workaround for this?
Problem in daal4py will be solved by this pull request. Memory leaks weren't observed on different input data formats (numpy ndarray/pandas dataframe, Fortran/C data order) with these changes.
As temporary solution before daal4py release with fix, you can convert input data to C order before passing to algorithm:
data = np.ascontiguousarray(data)
Kind regards, Alexander.