Intel® Distribution for Python*
Engage in discussions with community peers related to Python* applications and core computational packages.
424 Discussions

Why my test shows that Intel Python is actually slower than Anaconda Python?

jeff_c_
Beginner
1,365 Views

Hi everyone,

My test example shows that Intel Distribution for Python 2018 is actually slower than the Anaconda Python 5.0.0. Why is that? Is there anything I can do fix the speed?

How I install the Python Distribution:

1. Anaconda 5.0.0 Python 3.6.2
    // Installation Instruction
    Just download and install from https://repo.continuum.io/archive/Anaconda3-5.0.0-Windows-x86_64.exe

2. Intel Distribution for Python 3.6.2   
    // Installation Instruction (after completed step 1)
    conda config --add channels intel 
    conda create --name intelpy3 intelpython3_full python=3 statsmodels

 

Machine Setup and Result

1. Intel Xeon E5-2673v4 (32 Cores) 2.3Hz, 128DDR3, WinServer 2016 Datacenter x64 VM (Azure)
    Anaconda : 38s, 95s
    Intel    : 42s, 108s

2. Intel Corei5-3350P (4 Cores) 3.5Hz, 24GB DDR3, Win7x64 VM (Virtual Box)
    Anaconda : 72s, 130s
    Intel    : 82s, 165s

 

Source Code

import sys
import time
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf


def test1():
    cols = 13
    rows = 10000000
    raw_data = np.random.randint(2, size=cols * rows).reshape(rows, cols)
    col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07',
                 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome']
    df = pd.DataFrame(raw_data, columns=col_names)
    df['v11'] = df['v03'].apply(
        lambda x: ['t1', 't2', 't3', 't4'][np.random.randint(4)])
    df['v12'] = df['v03'].apply(lambda x: ['p1', 'p2'][np.random.randint(2)])
    return df


def test2(df):
    logit_formula = 'outcome ~ v01 + v02 + v03 + v04 + v05 + v06 + v07 + v08 + v09 + v10 + C(v11) + C(v12)'
    logit_model = smf.logit(formula=logit_formula, data=df).fit()
    print(logit_model.summary())


start_time = time.time()
df = test1()
t1 = time.time() - start_time

start_time = time.time()
test2(df)
t2 = time.time() - start_time

print(sys.version, "\nTest1: {}sec, Test2: {}sec".format(t1, t2))

 

0 Kudos
4 Replies
Oleksandr_P_Intel
1,365 Views

Hi, 

Thank for the reproducer. We can reproduce your observation and are looking into what makes IDP slower in this case. While we looking at this, I'd like to point it that creation of the DataFrame can be done about 10 times faster as follows:

def test1a():
    cols = 13
    rows = 10000000
    raw_data = np.random.randint(2, size=(rows,cols))
    col_names = ['v01', 'v02', 'v03', 'v04', 'v05', 'v06', 'v07',
                 'v08', 'v09', 'v10', 'v11', 'v12', 'outcome']
    df = pd.DataFrame(raw_data, columns=col_names)
    df['v11'] = np.take(
        np.array(['t1', 't2', 't3', 't4'], dtype=object),
        np.random.randint(4, size=rows))
    df['v12'] = np.take(
        np.array(['p1', 'p2'], dtype=object),
        np.random.randint(2, size=rows))
    return df

While execution of test1() takes about 30 seconds, execution of test1a() only 
takes about 3 seconds.

0 Kudos
jeff_c_
Beginner
1,365 Views

Wow, 10x improvement on test1() is huge! Thank you very much Oleksandr.

On test1() alone, the Anaconda Python on my machine is still slightly faster than Intel Python. Did you see the same result on your system that Intel Python being slower?

2.265s vs 2.882s

0 Kudos
Oleksandr_P_Intel
1,365 Views

Yes, the test1a() runs faster in Anaconda then in IDP, but we have not gotten to the bottom of the issue yet.

0 Kudos
Benahmed__Yacine
Beginner
1,365 Views

Has this been elucidated/fixed in the latest versions ?

0 Kudos
Reply