Intel® oneAPI Data Analytics Library
Community support for building compute-intensive applications that run fast on Intel® architecture.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

Serialization of DataSourceDictionary

Harvey_S_
Beginner
181 Views

Hi

I'm using a StringDataSource for importing categorical data (text labels) into NumericTables for SVM modelling. The idea being to import the data, let the DataSource work out the category labels, create the model then save the DataSourceDictionary and the model for later prediction.

I can serialize/deserialize the DataSourceDictionary ok, but the contained DataSourceFeatures don't seem to serialize the CategoricalFeatureDictionary which means I've lost the data labels. Should this work or have I missed something?

Kind Regards

 

 

0 Kudos
4 Replies
Ilya_B_Intel
Employee
181 Views

Thank you, Harvey for your reprort.

That is an issue indeed, we will fix that at the nearest release opportunity.

Harvey_S_
Beginner
181 Views

Ok thanks, as a temporary workaround I'll probably tokenize it myself and build up the NumericTable by hand. I see that the StringDataSource also build up the stats for the columns, can you tell me if this is necessary for a NumericTable bound for SVM training?

Ilya_B_Intel
Employee
181 Views

No, SVM does not use stats from NumericTable

Harvey_S_
Beginner
181 Views

Good. Thanks for your help.

 

Reply