Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

How to create a custom datasource?

Anwar_Ludin
Beginner
465 Views

I need to access data stored in XML files and use DAAL for data analysis. My understanding is that besides providing a custom DataSource/Dictionary pair for that, you also need to provide a feature manager. I read the following docs but I am still not sure how everything plugs together: https://software.intel.com/en-us/node/564582

Could you please indicate which classes in the framework I should extend (including the methods to override) in order to plug my custom xml file-based datasource in the framework?

Thanks!

0 Kudos
1 Reply
Ilya_B_Intel
Employee
465 Views

Anwar,

Technically you are not required to implement DataSource for XML files. It is used to store data into NumericTable only and is not reused by algorithm objects. So, if you can create any kind of NumericTable in any other way you can do that.

Generic XML assumes hierarchical data, while NumericTable assumes only 2-d data and thus you will need to apply some restriction to XML file and those restriction will be applid in DataSource.

In order to implement XML DataSource, one can look at include\data_management\data_source\file_data_source.h as an example. You will need to implement same functions as you see there.

The minimum must have list:

  • one of loadDataBlock functions, the one you'll use
  • createDictionaryFromContext, this one describes columns of the NumericTable.

You are not required to use FeatureManager class, but we use at as a helper concept with the following roles:

  • DataSource reads data from medium one row at one time.
  • FeatureManager converts row from string representation into numeric representation and stores that into NumericTable.

​Feel free to ask any further question regarding that we will try to help.

0 Kudos
Reply