Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jeff_c_
Beginner
198 Views

Why my test shows that Intel Python is actually slower than Anaconda Python?

Hi everyone,

My test example shows that Intel Distribution for Python 2018 is actually slower than the Anaconda Python 5.0.0. Why is that? Is there anything I can do fix the speed?

How I install the Python Distribution:

1. Anaconda 5.0.0 Python 3.6.2
    // Installation Instruction
    Just download and install from https://repo.continuum.io/archive/Anaconda3-5.0.0-Windows-x86_64.exe

2. Intel Distribution for Python 3.6.2   
    // Installation Instruction (after completed step 1)
    conda config --add channels intel 
    conda create --name intelpy3 intelpython3_full python=3 statsmodels

 

Machine Setup and Result

1. Intel Xeon E5-2673v4 (32 Cores) 2.3Hz, 128DDR3, WinServer 2016 Datacenter x64 VM (Azure)
    Anaconda : 38s, 95s
    Intel    : 42s, 108s

2. Intel Corei5-3350P (4 Cores) 3.5Hz, 24GB DDR3, Win7x64 VM (Virtual Box)
    Anaconda : 72s, 130s
    Intel    : 82s, 165s

 

Source Code

import sys
import time
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf


def test1():
    cols = 13
    rows = 10000000
    raw_data = np.random.randint(2, size=cols * rows).reshape(rows, cols)
    col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07',
                 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome']
    df = pd.DataFrame(raw_data, columns=col_names)
    df['v11'] = df['v03'].apply(
        lambda x: ['t1', 't2', 't3', 't4'][np.random.randint(4)])
    df['v12'] = df['v03'].apply(lambda x: ['p1', 'p2'][np.random.randint(2)])
    return df


def test2(df):
    logit_formula = 'outcome ~ v01 + v02 + v03 + v04 + v05 + v06 + v07 + v08 + v09 + v10 + C(v11) + C(v12)'
    logit_model = smf.logit(formula=logit_formula, data=df).fit()
    print(logit_model.summary())


start_time = time.time()
df = test1()
t1 = time.time() - start_time

start_time = time.time()
test2(df)
t2 = time.time() - start_time

print(sys.version, "\nTest1: {}sec, Test2: {}sec".format(t1, t2))

 

0 Kudos
4 Replies
198 Views

Hi, 

Thank for the reproducer. We can reproduce your observation and are looking into what makes IDP slower in this case. While we looking at this, I'd like to point it that creation of the DataFrame can be done about 10 times faster as follows:

def test1a():
    cols = 13
    rows = 10000000
    raw_data = np.random.randint(2, size=(rows,cols))
    col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07',
                 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome']
    df = pd.DataFrame(raw_data, columns=col_names)
    df['v11'] = np.take(
        np.array(['t1', 't2', 't3', 't4'], dtype=object),
        np.random.randint(4, size=rows))
    df['v12'] = np.take(
        np.array(['p1', 'p2'], dtype=object),
        np.random.randint(2, size=rows))
    return df

While execution of test1() takes about 30 seconds, execution of test1a() only 
takes about 3 seconds.

jeff_c_
Beginner
198 Views

Wow, 10x improvement on test1() is huge! Thank you very much Oleksandr.

On test1() alone, the Anaconda Python on my machine is still slightly faster than Intel Python. Did you see the same result on your system that Intel Python being slower?

2.265s vs 2.882s

198 Views

Yes, the test1a() runs faster in Anaconda then in IDP, but we have not gotten to the bottom of the issue yet.

Benahmed__Yacine
Beginner
198 Views

Has this been elucidated/fixed in the latest versions ?