Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.
225 Discussions

why daal4py model is 8.02 GB while the same LightGBM model is 426 KB ?

Adarsh2
Beginner
1,331 Views

I am using Intel Devcloud and I have created a lightgbm model for binary classification using this code -

 

import lightgbm as lgb

bst = lgb.train(params, train_data, num_boost_round=10000, valid_sets=[valid_data], callbacks = [callback])

 

I read this article which talks about how daal4py make LightGBM inference faster. So, I converted this lightgbm model into daal4py model for inference using this code - 

 

daal_model = d4p.get_gbt_model_from_lightgbm(bst)

 

After this I tried to save this "daal_model" in my pc using pickle.

 

import pickle

with open('model.pkl','wb') as out:
    pickle.dump(daal_model, out)

 

Issue

The model.pkl file is 8.02 GB large. How is this possible? And why is it happening? I tried saving my normal model "bst" using pickle and it is just 426 KB. Why daal4py model is so big?

 

How to Reproduce -

Labels (1)
0 Kudos
1 Solution
Huiyan_C_Intel
Moderator
998 Views

Hi @Adarsh2,

By design, daal4py trees expect a dense node structure and allocate memory accordingly. The provided example creates extremely sparse trees and is unsuitable for running in daal4py.


We will add support for these scenarios. At the time being, you can run the model with a reasonable maximum depth, for example params['max_depth'] = 8. It provides similar accuracy, and the resulting model dump is only 2.9 MB.


View solution in original post

0 Kudos
4 Replies
AthiraM_Intel
Moderator
1,283 Views

Hi,


Thank you for posting in Intel Communities.


We have observed the same issue with daal4py (version - 2023.2.1) when running your code in DevCloud for oneAPI.


We are checking on this internally, will get back to you with an update.


Meanwhile can you share the version of daal4py you are using?



Thanks


0 Kudos
Adarsh2
Beginner
1,278 Views

I am using 2023.1.1

d4p._get__version__()
>> '(2023, 1, 1, \'"P"\')'
0 Kudos
Huiyan_C_Intel
Moderator
1,258 Views

Thanks for reporting this issue, we are looking into it.


0 Kudos
Huiyan_C_Intel
Moderator
999 Views

Hi @Adarsh2,

By design, daal4py trees expect a dense node structure and allocate memory accordingly. The provided example creates extremely sparse trees and is unsuitable for running in daal4py.


We will add support for these scenarios. At the time being, you can run the model with a reasonable maximum depth, for example params['max_depth'] = 8. It provides similar accuracy, and the resulting model dump is only 2.9 MB.


0 Kudos
Reply