Intel® Data Analytics Acceleration Library (Intel DAAL) introduces Data Sources which define interfaces for access and management of data in raw format. Some popular types of Data Sources such as Open Database Connectivity (ODBC) are implemented in the library. Below we describe how to use the library for accessing data stored in a database (DB) which supports ODBC connectivity, by specifying necessary components and describing typical steps. For demonstration we use MySQL* DB as example, however, we assume that similar flow is applied to other DB supporting ODBC.
Before you use Intel DAAL to access data in MySQL* DB you should have respective ODBC driver (e.g. unixODBC) and ODBC connector (e.g., MySQL* connector) installed on your system.
As a first step, construct the object of the ODBCDataSource type. As other Data Sources implemented in the library, this class takes a feature manager class as a template parameter. Feature manager is responsible for parsing, converting the data into numeric format and other related operations on data. Use MySQLFeatureManager when working with MySQL* DB. The other parameters you should provide into ODBCDataSource constructor include data source name (DSN), table name, username, password.
Example below shows construction step for ODBC Data Base:
ODBCDataSource<MySQLFeatureManager> dataSource( “UserDataSourceName”, “UserTableName”, "Username", "UserPassword", DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext );
Parameters doAllocateNumericTable and doDictionaryFromContext are service: the first flag instruct the constructor to allocate memory for Numeric Table which will hold the DB data in the numeric format (by default, Homogeneous Numeric Table for storing the data in double precision is created), while the second flag requires the library to automatically create and initialize Data Dictionary describing the data in the DB.
After the object is successfully created, you can use it to access the data in the DB.
If you want to load n data rows from the DB to the NumericTable allocated by the constructor above, use method:
It returns number of rows that were read.
If you want to get the table with data loaded earlier from the data base call the method:
This method returns Shared Pointer to the respective Numeric Table. To operate with shared pointer use the class member access operator (->) or the dereference operator (*).
If you want the MySQL* Data Source to load the data into Numeric Table allocated on your side, use another version of the loadDataBlock method as shown below:
double array[nFeatures * n]; HomogenNumericTable<double> myNumericTable(array, nFeatures, n); dataSource.loadDataBlock(n, &myNumericTable);
Note, that if you plan to use your Numeric Table, you can instruct the constructor not to allocate the memory by providing the parameter DataSource::notAllocateNumericTable.
Data can be read from table or from view. If you want to read data from several database tables to one numeric table, you should create a view and set its name as parameter in DataSource constructor.
When you complete work with the DB, call the method to free handles of ODBC connection:
Note that the library provides C++ sample which demonstrates the typical steps described above for accessing the data in MySQL* DB.
thank you for your comment. As described above, Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) introduces Data Source, the generic component which defines interfaces for access and management of data in raw format. The library provides implementation of the interface for some popular types of sources such as CSV or SQL. You can also find the samples in the library that show how to use it in the Spark* environment. It is obvious, that it is impossible to cover all known data sources in the library. If a user has a specific data source that is not available in the library, he or she can derive from the Data Source class, define all respective methods and use it in the application with the algorithms of the library in the normal way. On our side, we understand the variety of possible data source use scenarios and the problem of data transfer related overheads, and continue analysis of the approaches on how to get the algorithms closer to data source to minimize those overheads.