- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
My test example shows that Intel Distribution for Python 2018 is actually slower than the Anaconda Python 5.0.0. Why is that? Is there anything I can do fix the speed?
How I install the Python Distribution:
1. Anaconda 5.0.0 Python 3.6.2
// Installation Instruction
Just download and install from https://repo.continuum.io/archive/Anaconda3-5.0.0-Windows-x86_64.exe
2. Intel Distribution for Python 3.6.2
// Installation Instruction (after completed step 1)
conda config --add channels intel
conda create --name intelpy3 intelpython3_full python=3 statsmodels
Machine Setup and Result
1. Intel Xeon E5-2673v4 (32 Cores) 2.3Hz, 128DDR3, WinServer 2016 Datacenter x64 VM (Azure)
Anaconda : 38s, 95s
Intel : 42s, 108s
2. Intel Corei5-3350P (4 Cores) 3.5Hz, 24GB DDR3, Win7x64 VM (Virtual Box)
Anaconda : 72s, 130s
Intel : 82s, 165s
Source Code
import sys import time import numpy as np import pandas as pd import statsmodels.formula.api as smf def test1(): cols = 13 rows = 10000000 raw_data = np.random.randint(2, size=cols * rows).reshape(rows, cols) col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07', 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome'] df = pd.DataFrame(raw_data, columns=col_names) df['v11'] = df['v03'].apply( lambda x: ['t1', 't2', 't3', 't4'][np.random.randint(4)]) df['v12'] = df['v03'].apply(lambda x: ['p1', 'p2'][np.random.randint(2)]) return df def test2(df): logit_formula = 'outcome ~ v01 + v02 + v03 + v04 + v05 + v06 + v07 + v08 + v09 + v10 + C(v11) + C(v12)' logit_model = smf.logit(formula=logit_formula, data=df).fit() print(logit_model.summary()) start_time = time.time() df = test1() t1 = time.time() - start_time start_time = time.time() test2(df) t2 = time.time() - start_time print(sys.version, "\nTest1: {}sec, Test2: {}sec".format(t1, t2))
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank for the reproducer. We can reproduce your observation and are looking into what makes IDP slower in this case. While we looking at this, I'd like to point it that creation of the DataFrame can be done about 10 times faster as follows:
def test1a(): cols = 13 rows = 10000000 raw_data = np.random.randint(2, size=(rows,cols)) col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07', 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome'] df = pd.DataFrame(raw_data, columns=col_names) df['v11'] = np.take( np.array(['t1', 't2', 't3', 't4'], dtype=object), np.random.randint(4, size=rows)) df['v12'] = np.take( np.array(['p1', 'p2'], dtype=object), np.random.randint(2, size=rows)) return df
While execution of test1() takes about 30 seconds, execution of test1a() only
takes about 3 seconds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow, 10x improvement on test1() is huge! Thank you very much Oleksandr.
On test1() alone, the Anaconda Python on my machine is still slightly faster than Intel Python. Did you see the same result on your system that Intel Python being slower?
2.265s vs 2.882s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the test1a() runs faster in Anaconda then in IDP, but we have not gotten to the bottom of the issue yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Has this been elucidated/fixed in the latest versions ?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page