I'm wondering if the DAAL regression libraries offer more precision and accuracy than MLLib Spark Libraries? Or is the main difference solely performance?
Intel DAAL library supports two methods for linear regression model training:
Both methods are direct, that is they compute the coefficients of the model by a finite sequence of operations.
Intel DAAL version of linear regression algorithm supports computations in double and single precision whereas you have the option to provide your input data into the algorithm as well as save the trained the model in a different precision.
For example, you can provide single precision data set, choose double precision for intermediate computations, and store the results in single precision again.
Please refer to DAAL Programming Guide for the details: https://software.intel.com/en-us/node/564682
MLLib* provides a different, iterative SGD based algorithm for training of linear regression model. The accuracy of the result is defined by number of the iterations of the algorithm.
Thus, performance comparison of the linear regression algorithms in both solutions is not apple-to-apple.
At the same time the chart available at https://software.intel.com/en-us/daal gives the idea about performance of Intel DAAL and MLLib*.