DAAL model created from XGBoost model doesn't match

DK0 · ‎02-15-2022

I trained an XGBoost model with default hyper-parameters on a trivial data set (see attached code) and then used

daal4py.get_gbt_model_from_xgboost

to create a daal model.

My XGBoost model worked fine but the daal model was useless!

	y_train	xgb - y_train	daal - y_train
count	101	101	101
mean	50	-0.00012	42.90
std	29.300171	0.03422	26.94
min	0	-0.07770	-0.02
25%	25	-0.01358	19.96
50%	50	0.00177	42.05
75%	75	0.01322	65.98
max	100	0.07726	89.98

Any idea what is wrong with the (short) attached code (or the environment)?

Steps to reproduce:

Install the Intel Distribution for Python on windows10.
Install the Intel® oneAPI Data Analytics Library (oneDAL) toolkit.
conda install xgboost
Run the attached (short) python script

DK0 · ‎02-16-2022

Solved!

It turns out that the DAAL model uses a permutation of the features. If I train XGBoost on ["X1", "X2"] I need to feed the DAAL model ["X2", "X1"]. I figured it out after coming across this thread which also suggests:

Additionally, calling .dump_model() on both your Python and C++ model objects will yield the same Decision Trees, but the Python one will have all the feature names and the C++ one will likely have f0, f1, f2, .... You can compare these two to get your actual column ordering, and then your predictions will match across languages(Not entirely, b/c of rounding).

JaideepK_Intel · ‎02-16-2022

Hi,

Thank you for posting in Intel Communities.

Glad to know that your issue is resolved. Thanks for sharing the solution with us. Can we close this case?

Thanks,

Jaideep

JaideepK_Intel · ‎02-27-2022

Hi,

We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.

Thanks,

Jaideep