I need to access data stored in XML files and use DAAL for data analysis. My understanding is that besides providing a custom DataSource/Dictionary pair for that, you also need to provide a feature manager. I read the following docs but I am still not sure how everything plugs together: https://software.intel.com/en-us/node/564582
Could you please indicate which classes in the framework I should extend (including the methods to override) in order to plug my custom xml file-based datasource in the framework?
Technically you are not required to implement DataSource for XML files. It is used to store data into NumericTable only and is not reused by algorithm objects. So, if you can create any kind of NumericTable in any other way you can do that.
Generic XML assumes hierarchical data, while NumericTable assumes only 2-d data and thus you will need to apply some restriction to XML file and those restriction will be applid in DataSource.
In order to implement XML DataSource, one can look at include\data_management\data_source\file_data_source.h as an example. You will need to implement same functions as you see there.
The minimum must have list:
- one of loadDataBlock functions, the one you'll use
- createDictionaryFromContext, this one describes columns of the NumericTable.
You are not required to use FeatureManager class, but we use at as a helper concept with the following roles:
- DataSource reads data from medium one row at one time.
- FeatureManager converts row from string representation into numeric representation and stores that into NumericTable.
Feel free to ask any further question regarding that we will try to help.