- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hello,

Can anyone please help and verify the Lipschitz constant formula at the bottom of the page https://software.intel.com/en-us/daal-programming-guide-logistic-loss ?

Currently, in the Developer Guide document the upper bound for logistic loss Lipschitz constant is specified as max||x_{i}||_{2} + λ_{2}/n

However, it could be shown that logistic loss Hessian spectral norm upper bound (rough) is rather max||x_{i}||^{2}_{2} + λ_{2 }

Your clarification is much appreciated.

Thanks,

Dmitry

- Tags:
- Bug

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

We are forwarding this case to technical experts.They will assist you shortly.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hello, Dmitry

Thanks for reporting this statement! Seems instead of max||x_{i}||_{2} we should use max||x_{i}||^{2}_{2} , it was misprint.

λ_{2} scaling was done to align sklearn regularization penalty value. (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/_sag.py#L78)

Note: Lipschitz constant is used only for SAGA algorithm, for calculation optimal step size.

https://hal.inria.fr/hal-00860051/document

Best regards,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Kirill,

Thanks for getting back to me.

Couple of thoughts about Lipschitz constant:

- Please note, that in the formula for logistic loss (https://software.intel.com/en-us/daal-programming-guide-logistic-loss) the regularization term is not divided by n, as the preceding sum term. As a result, if you plug in zero design matrix into the formula(with no intercept), you will get Hessian exactly 2*λ*E and norm of the Hessian is 2*λ - it is a sharp upper bound. Obviously, λ/n is less than 2* λ. (In my first post I omitted multiplier 2 since normally quadratic term is divided by 2)
- In the original paper https://arxiv.org/pdf/1407.0202.pdf the L
_{2}regularization term is a part of sum, as it is a smooth component and therefore divided by n. That could be the source of λ/n term. - The max||x||
^{2}is a pretty much rough upper bound for the Lipschitz constant. As a result, the step size is small and the convergence rate may not be optimal. In my experiments decreasing Lipschitz constant may improve SAGA algorithm performance.

I would like to know your opinion, since I am implementing a custom objective function for the SAGA algorithm and looking for high performance computing with large data sets.

Best regards,

Dmitry.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hello, Dmitry

Currently, algorithm supports parameter 'learningRateSequence' (https://software.intel.com/en-us/daal-programming-guide-computation-13), if we see some performance improvements with another step size, DAAL library provides options to use this advantages. For reason of conformance with sklearn library we can`t change this automatic step selection, however 'learningRateSequence' parameter is provided.

About source of scaling λ term by n I think it make sense.

Best regards,

Kirill

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page