<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re:Memory leak when using KNeighborsClassifier in Intel® Distribution for Python*</title>
    <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1242690#M1578</link>
    <description>&lt;P&gt;Hi &lt;SPAN&gt;Chithra,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;sorry for the late reply - I was away for New Year.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1)&lt;BR /&gt;It's a Conda install:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;conda config --add channels intel&lt;BR /&gt;conda create -n idp-test2 intelpython3_core python=3.7&lt;BR /&gt;conda activate idp-test2&lt;BR /&gt;conda install scikit-learn&lt;BR /&gt;conda install pandas&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;I have tried installing the full version as well, but got the same result.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2)&lt;BR /&gt;I don't have another sample, but I will try to find a work-around, and perhaps that will create another sample.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;It is somehow a combination of KFold and kNN.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;If I change the last line to&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;knn.fit(train_data, train_targets), ignoring the KFold data, then it works as expected. And if I comment out the kNN fit, but keep the KFold, then it works as well.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kr,&lt;BR /&gt;Brian&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 02 Jan 2021 13:08:47 GMT</pubDate>
    <dc:creator>brian3</dc:creator>
    <dc:date>2021-01-02T13:08:47Z</dc:date>
    <item>
      <title>Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1240787#M1573</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I'm doing CV for a kNN classifier. I load data into a DataFrame, use KFold to split it, and run the classifier on the split data.&lt;/P&gt;
&lt;P&gt;The Intel distribution gives me a huge speedup on the kNN classifier, but I quickly run out of memory. If I use a vanilla distribution, it works fine.&lt;/P&gt;
&lt;P&gt;Here is a small test case to reproduce the problem:&lt;/P&gt;
&lt;PRE&gt;import daal4py.sklearn&lt;BR /&gt;daal4py.sklearn.patch_sklearn()&lt;BR /&gt;from sklearn.datasets import make_classification&lt;BR /&gt;from sklearn.neighbors import KNeighborsClassifier&lt;BR /&gt;from sklearn.model_selection import KFold&lt;BR /&gt;import pandas as pd&lt;BR /&gt;&lt;BR /&gt;X, y = make_classification(n_samples=14000, n_features=120, n_informative=20,n_classes=2)&lt;BR /&gt;&lt;BR /&gt;train_data = pd.DataFrame.from_records(X)&lt;BR /&gt;train_targets = pd.Series(y)&lt;BR /&gt;&lt;BR /&gt;for i in range(300):&lt;BR /&gt;    print(i)&lt;BR /&gt;    kf = KFold(n_splits=3)&lt;BR /&gt;  &lt;BR /&gt;    for train_index, test_index in kf.split(train_data):&lt;BR /&gt;        X_train, X_test = train_data.iloc[train_index], train_data.iloc[test_index]&lt;BR /&gt;        y_train, y_test = train_targets.iloc[train_index], train_targets.iloc[test_index] &lt;BR /&gt;&lt;BR /&gt;        knn = KNeighborsClassifier(n_neighbors=120)&lt;BR /&gt;        knn.fit(X_train, y_train)&lt;BR /&gt;&lt;BR /&gt;&lt;/PRE&gt;
&lt;P&gt;After 300 iterations, this code will use around 8Gb of memory, but only 120Mb with the first two lines removed.&lt;/P&gt;
&lt;P&gt;I'm running on Win 10, with the following conda environment:&lt;/P&gt;
&lt;PRE&gt;bzip2 1.0.8 vc14h5832a3a_5 [vc14] intel&lt;BR /&gt;certifi 2020.6.20 py37hefe589e_1 intel&lt;BR /&gt;common_cmplr_lib_rt 2021.1.1 intel_191 intel&lt;BR /&gt;common_cmplr_lic_rt 2021.1.1 intel_191 intel&lt;BR /&gt;daal4py 2021.1 py37hf1d83b1_3 intel&lt;BR /&gt;dal 2021.1.1 intel_71 intel&lt;BR /&gt;dpcpp_cpp_rt 2021.1.1 intel_191 intel&lt;BR /&gt;fortran_rt 2021.1.1 intel_191 intel&lt;BR /&gt;icc_rt 2021.1.1 intel_191 intel&lt;BR /&gt;impi_rt 2021.1.1 intel_88 intel&lt;BR /&gt;intel-openmp 2021.1.1 intel_191 intel&lt;BR /&gt;intelpython 2021.1.1 1 intel&lt;BR /&gt;intelpython3_core 2021.1.1 0 intel&lt;BR /&gt;joblib 0.17.0 py37hbce671c_0 intel&lt;BR /&gt;mkl 2021.1.1 intel_52 intel&lt;BR /&gt;mkl-service 2.3.0 py37h939bbf7_6 intel&lt;BR /&gt;mkl_fft 1.2.0 py37h7c155fc_4 intel&lt;BR /&gt;mkl_random 1.2.0 py37h39757d5_4 intel&lt;BR /&gt;mkl_umath 0.1.0 py37h50e0a19_0 intel&lt;BR /&gt;numpy 1.19.2 py37h02626c5_0 intel&lt;BR /&gt;numpy-base 1.19.2 py37h141cca1_0 intel&lt;BR /&gt;opencl_rt 2021.1.1 intel_191 intel&lt;BR /&gt;openssl 1.1.1h vc14he774522_0 [vc14] intel&lt;BR /&gt;pandas 1.1.2 py37h19d3ef7_0 intel&lt;BR /&gt;pip 20.2.3 py37h56aae7b_1 intel&lt;BR /&gt;python 3.7.9 h64ef1ba_1 intel&lt;BR /&gt;python-dateutil 2.8.1 py37hd8ca5e9_2 intel&lt;BR /&gt;pytz 2020.1 py37h0699639_1 intel&lt;BR /&gt;pyyaml 5.3.1 py37h1acd8f6_0 intel&lt;BR /&gt;scikit-learn 0.23.2 py37hf1917a8_5 intel&lt;BR /&gt;scipy 1.5.2 py37h002189b_0 intel&lt;BR /&gt;setuptools 50.3.2 py37h4200cf5_0 intel&lt;BR /&gt;six 1.15.0 py37h65307dc_1 intel&lt;BR /&gt;sqlite 3.33.0 vc14h5832a3a_1 [vc14] intel&lt;BR /&gt;tbb 2021.1.1 vc14_intel_133 [vc14] intel&lt;BR /&gt;tbb4py 2021.1.1 py37_intel_133 [vc14] intel&lt;BR /&gt;tcl 8.6.9 vc14he774522_27 [vc14] intel&lt;BR /&gt;threadpoolctl 2.1.0 py37h6447541_2 intel&lt;BR /&gt;tk 8.6.9 vc14h57a849e_8 [vc14] intel&lt;BR /&gt;vc 14.1 h869be7e_15 intel&lt;BR /&gt;vs2015_runtime 14.16.27012 hf0eaf9b_15 intel&lt;BR /&gt;wheel 0.35.1 py37h4a4c509_1 intel&lt;BR /&gt;wincertstore 0.2 py37_4 intel&lt;BR /&gt;xz 5.2.5 hea85519_2 intel&lt;BR /&gt;yaml 0.1.7 hd09c893_7 intel&lt;BR /&gt;zlib 1.2.11.1 vc14ha0a531f_3 [vc14] intel&lt;/PRE&gt;
&lt;P&gt;Any help much appreciated.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Brian&lt;/P&gt;</description>
      <pubDate>Fri, 25 Dec 2020 14:32:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1240787#M1573</guid>
      <dc:creator>brian3</dc:creator>
      <dc:date>2020-12-25T14:32:47Z</dc:date>
    </item>
    <item>
      <title>Re:Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1241226#M1574</link>
      <description>&lt;P&gt;Hi Brain,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;&lt;P&gt;We tried running the reproducer code given in an environment where Intel python is installed through conda. We have tried changing the no of iterations and ran up to 1000 iterations. Also, We checked in two different versions of Intel Python v2021.1.1 &amp;amp; v2019.5. But, we couldn't observe any such out of memory issue. So, it would be better if you give few more details like&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;1. Can you confirm on the Intel Python installation, is it through conda package or through stand alone installer?&lt;/P&gt;&lt;P&gt;2. Did you observe same issue with any other samples?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Chithra&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 28 Dec 2020 14:37:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1241226#M1574</guid>
      <dc:creator>ChithraJ_Intel</dc:creator>
      <dc:date>2020-12-28T14:37:27Z</dc:date>
    </item>
    <item>
      <title>Re:Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1242569#M1577</link>
      <description>&lt;P&gt;Hi Brain,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please give us an update on your issue?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Chithra&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 01 Jan 2021 12:50:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1242569#M1577</guid>
      <dc:creator>ChithraJ_Intel</dc:creator>
      <dc:date>2021-01-01T12:50:03Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1242690#M1578</link>
      <description>&lt;P&gt;Hi &lt;SPAN&gt;Chithra,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;sorry for the late reply - I was away for New Year.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1)&lt;BR /&gt;It's a Conda install:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN&gt;conda config --add channels intel&lt;BR /&gt;conda create -n idp-test2 intelpython3_core python=3.7&lt;BR /&gt;conda activate idp-test2&lt;BR /&gt;conda install scikit-learn&lt;BR /&gt;conda install pandas&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;I have tried installing the full version as well, but got the same result.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2)&lt;BR /&gt;I don't have another sample, but I will try to find a work-around, and perhaps that will create another sample.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;It is somehow a combination of KFold and kNN.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;If I change the last line to&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: inherit;"&gt;knn.fit(train_data, train_targets), ignoring the KFold data, then it works as expected. And if I comment out the kNN fit, but keep the KFold, then it works as well.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kr,&lt;BR /&gt;Brian&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 02 Jan 2021 13:08:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1242690#M1578</guid>
      <dc:creator>brian3</dc:creator>
      <dc:date>2021-01-02T13:08:47Z</dc:date>
    </item>
    <item>
      <title>Re:Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1243386#M1579</link>
      <description>&lt;P&gt;Hi Brain,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for the response. We have checked the memory consumption for the sample use case that you given. As you said, we have observed that running the sample in IntelPython takes around 8GB(with K-fold cross validation, where k=3) whereas in Normal Python it took only 119-130Mb. So, we are forwarding this case to Subject Matter Experts to check the performance issue. They will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Chithra&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 05 Jan 2021 10:08:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1243386#M1579</guid>
      <dc:creator>ChithraJ_Intel</dc:creator>
      <dc:date>2021-01-05T10:08:11Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1243996#M1580</link>
      <description>Thanks Chithra
/b</description>
      <pubDate>Wed, 06 Jan 2021 23:37:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1243996#M1580</guid>
      <dc:creator>brian3</dc:creator>
      <dc:date>2021-01-06T23:37:28Z</dc:date>
    </item>
    <item>
      <title>Re: Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1247240#M1585</link>
      <description>&lt;P&gt;Hi Brian,&lt;/P&gt;
&lt;P&gt;The problem consist of two parts:&lt;BR /&gt;1) not best choice of DataFrame constructor 'from_records' which transform passed ndarray in specific way and led to&lt;BR /&gt;2) uncorrect work of daal4py internal data conversion resulting in memory leak&lt;/P&gt;
&lt;P&gt;The fix is simple, replace &lt;BR /&gt;`pd.DataFrame.from_records(X)`&lt;BR /&gt;with&lt;BR /&gt;`pd.DataFrame(X)`&lt;/P&gt;
&lt;P&gt;Memory profiler showed that memory consumption went to normal with fix.&lt;BR /&gt;Let me know if problem will not disappear.&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;BR /&gt;Alexander&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 08:43:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1247240#M1585</guid>
      <dc:creator>AlexanderAndreev</dc:creator>
      <dc:date>2021-01-18T08:43:51Z</dc:date>
    </item>
    <item>
      <title>Re: Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1247416#M1587</link>
      <description>&lt;P&gt;Hi Alexander,&lt;/P&gt;
&lt;P&gt;yes, I can confirm that using the default constructor fixes the problem in the sample.&lt;/P&gt;
&lt;P&gt;In my real program, I load the data from a csv file. Something like:&lt;/P&gt;
&lt;PRE&gt;train = pd.read_csv('myfile.csv')&lt;BR /&gt;train_targets = train['Target']&lt;BR /&gt;train_data = train[features]&lt;/PRE&gt;
&lt;P&gt;This code, using read_csv(), also triggers a memory leak.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried to load the csv with numpy, and pass the ndarray to pandas like this:&lt;/P&gt;
&lt;PRE&gt;tmp = genfromtxt('myfile.csv')&lt;BR /&gt;train = pd.DataFrame(tmp)&lt;/PRE&gt;
&lt;P&gt;but that also triggers a memory leak.&lt;/P&gt;
&lt;P&gt;Do you see a workaround for this?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;BR /&gt;Brian&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2021 18:23:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1247416#M1587</guid>
      <dc:creator>brian3</dc:creator>
      <dc:date>2021-01-18T18:23:35Z</dc:date>
    </item>
    <item>
      <title>Re: Memory leak when using KNeighborsClassifier</title>
      <link>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1248754#M1594</link>
      <description>&lt;P&gt;Problem in daal4py will be solved by &lt;A href="https://github.com/IntelPython/daal4py/pull/464" target="_self"&gt;this pull request&lt;/A&gt;. Memory leaks weren't observed on different input data formats (numpy ndarray/pandas dataframe, Fortran/C data order) with these changes.&lt;/P&gt;
&lt;P&gt;As temporary solution before daal4py release with fix, you can convert input data to C order before passing to algorithm:&lt;/P&gt;
&lt;PRE&gt;data =&amp;nbsp;np.ascontiguousarray(data)&lt;/PRE&gt;
&lt;P&gt;&lt;A href="https://numpy.org/doc/stable/reference/generated/numpy.ascontiguousarray.html" target="_self"&gt;Link to function in numpy docs&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Kind regards, Alexander.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2021 22:07:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-for-Python/Memory-leak-when-using-KNeighborsClassifier/m-p/1248754#M1594</guid>
      <dc:creator>AlexanderAndreev</dc:creator>
      <dc:date>2021-01-21T22:07:35Z</dc:date>
    </item>
  </channel>
</rss>

