How do I use this library with a SQL database?

Gennady_F_Intel · ‎04-23-2015

How do I use this library with a SQL database? any examples?

thanks

Olga_R_Intel · ‎04-23-2015

Intel® Data Analytics Acceleration Library (Intel DAAL) introduces Data Sources which define interfaces for access and management of data in raw format. Some popular types of Data Sources such as Open Database Connectivity (ODBC) are implemented in the library. Below we describe how to use the library for accessing data stored in a database (DB) which supports ODBC connectivity, by specifying necessary components and describing typical steps. For demonstration we use MySQL* DB as example, however, we assume that similar flow is applied to other DB supporting ODBC.

Before you use Intel DAAL to access data in MySQL* DB you should have respective ODBC driver (e.g. unixODBC) and ODBC connector (e.g., MySQL* connector) installed on your system.

As a first step, construct the object of the ODBCDataSource type. As other Data Sources implemented in the library, this class takes a feature manager class as a template parameter. Feature manager is responsible for parsing, converting the data into numeric format and other related operations on data. Use MySQLFeatureManager when working with MySQL* DB. The other parameters you should provide into ODBCDataSource constructor include data source name (DSN), table name, username, password.

Example below shows construction step for ODBC Data Base:

ODBCDataSource<MySQLFeatureManager> dataSource(
	“UserDataSourceName”,
	“UserTableName”,
	"Username",
	"UserPassword",
	DataSource::doAllocateNumericTable,
	DataSource::doDictionaryFromContext
);

Parameters doAllocateNumericTable and doDictionaryFromContext are service: the first flag instruct the constructor to allocate memory for Numeric Table which will hold the DB data in the numeric format (by default, Homogeneous Numeric Table for storing the data in double precision is created), while the second flag requires the library to automatically create and initialize Data Dictionary describing the data in the DB.

After the object is successfully created, you can use it to access the data in the DB.

If you want to load n data rows from the DB to the NumericTable allocated by the constructor above, use method:

dataSource.loadDataBlock(n);

It returns number of rows that were read.

If you want to get the table with data loaded earlier from the data base call the method:

dataSource.getNumericTable();

This method returns Shared Pointer to the respective Numeric Table. To operate with shared pointer use the class member access operator (->) or the dereference operator (*).

If you want the MySQL* Data Source to load the data into Numeric Table allocated on your side, use another version of the loadDataBlock method as shown below:

double array[nFeatures * n];
HomogenNumericTable<double> myNumericTable(array, nFeatures, n);
dataSource.loadDataBlock(n, &myNumericTable);

Note, that if you plan to use your Numeric Table, you can instruct the constructor not to allocate the memory by providing the parameter DataSource::notAllocateNumericTable.

Data can be read from table or from view. If you want to read data from several database tables to one numeric table, you should create a view and set its name as parameter in DataSource constructor.

When you complete work with the DB, call the method to free handles of ODBC connection:

dataSource.freeHandles();

Note that the library provides C++ sample which demonstrates the typical steps described above for accessing the data in MySQL* DB.

SergeyKostrov · ‎08-13-2015

Even if the post is almost 4 months old THIS IS a Fundamental Subject because Big Data Sets do not exist in some abstract space and, in many cases, they already stored in some relational databases!. Let me provide some technical details... >>How do I use this library with a SQL database? any examples? This is a really good question! However, after reading Olga's response I clearly see that Intel's DAAL team is at least 10 years behind of what I've done working on a project related to high-performance processing ( geo-coding and see ( *** ) ) of some really big real estate data sets stored in Microsoft SQL Server database. In 2005 I wrote two articles which could give you a generic idea on how I solved the problem: www.mp2kmag.com/a122--c-c++.automation.ole.mappoint.html Part 1 www.mp2kmag.com/a123--visual.c++.sqlserver.mappoint.html Part 2 ( *** ) - My task could be described as follows: For a given set of property addresses calculate their geographic latitudes and longitudes with accuracy +/-25 meters, or better, on the Server side of Microsoft SQL Server. An ODBC-based solution on a Client side is absolutely Not efficient when it comes to high-performance processing of real Big Data sets. A company I was working for between 2004 - 2006 years went through this ( I mean Client side processing ) and a decision was made to move ALL processing on the Server side with a very complex COM-based integration of Microsoft SQL Server ( database normalized to 4th Normal Form ), Satori software ( provided verification, validation and correction of mailing addresses of ALL properties ), and Microsoft MapPoint ( versions 2004 / 2006 to calculate geographic latitudes and longitudes of a property ) using a set of Extended Stored Procedures supported by Microsoft SQL Server implemented in C/C++.

Andrey_N_Intel · ‎08-14-2015

Hello Sergey,

thank you for your comment. As described above, Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) introduces Data Source, the generic component which defines interfaces for access and management of data in raw format. The library provides implementation of the interface for some popular types of sources such as CSV or SQL. You can also find the samples in the library that show how to use it in the Spark* environment. It is obvious, that it is impossible to cover all known data sources in the library. If a user has a specific data source that is not available in the library, he or she can derive from the Data Source class, define all respective methods and use it in the application with the algorithms of the library in the normal way. On our side, we understand the variety of possible data source use scenarios and the problem of data transfer related overheads, and continue analysis of the approaches on how to get the algorithms closer to data source to minimize those overheads.

Andrey

Four_28 · ‎10-02-2015

i interested to learn more about Intel DAAL. anyone can tell step by step to learn Intel DAAL?

Gennady_F_Intel · ‎10-05-2015

I am not sure understand what exactly you want to learn? We may recommend you to look at the many of example shows how to use DAAL's functionality, how to link and etc..

SergeyKostrov · ‎10-12-2015

This is a short follow up. >>... >>www.mp2kmag.com/a122--c-c++.automation.ole.mappoint.html Part 1 >>www.mp2kmag.com/a123--visual.c++.sqlserver.mappoint.html Part 2 >>... By some unexplained reason www.mp2kmag.com website is no longer accessible. So, I've attached an MS Word version of my article I wrote in 2005.

SergeyKostrov · ‎10-12-2015

This is a short follow up ( Part 2 ). An MS PowerPoint presentation file with agenda attached as well.

Andrey_N_Intel · ‎10-13-2015

Thanks, Sergey