Search Results

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Item

Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization

2022, Dvurechensky, Pavel, Kamzolov, Dmitry, Lukashevich, Aleksandr, Lee, Soomin, Ordentlich, Erik, Uribe, César A., Gasnikov, Alexander

Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million.

Loading...
Thumbnail Image
Item

Accelerated variance-reduced methods for saddle-point problems

2022, Borodich, Ekaterina, Tominin, Vladislav, Tominin, Yaroslav, Kovalev, Dmitry, Gasnikov, Alexander, Dvurechensky, Pavel

We consider composite minimax optimization problems where the goal is to find a saddle-point of a large sum of non-bilinear objective functions augmented by simple composite regularizers for the primal and dual variables. For such problems, under the average-smoothness assumption, we propose accelerated stochastic variance-reduced algorithms with optimal up to logarithmic factors complexity bounds. In particular, we consider strongly-convex-strongly-concave, convex-strongly-concave, and convex-concave objectives. To the best of our knowledge, these are the first nearly-optimal algorithms for this setting.

Loading...
Thumbnail Image
Item

Generalized self-concordant analysis of Frank–Wolfe algorithms

2022, Dvurechensky, Pavel, Safin, Kamil, Shtern, Shimrit, Staudigl, Mathias

Projection-free optimization via different variants of the Frank–Wolfe method has become one of the cornerstones of large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex, making them a challenging class of functions for first-order methods. Indeed, in a number of applications, such as inverse covariance estimation or distance-weighted discrimination problems in binary classification, the loss is given by a generalized self-concordant function having potentially unbounded curvature. For such problems projection-free minimization methods have no theoretical convergence guarantee. This paper closes this apparent gap in the literature by developing provably convergent Frank–Wolfe algorithms with standard O(1/k) convergence rate guarantees. Based on these new insights, we show how these sublinearly convergent methods can be accelerated to yield linearly convergent projection-free methods, by either relying on the availability of a local liner minimization oracle, or a suitable modification of the away-step Frank–Wolfe method.