Search Results

Now showing 1 - 10 of 20
  • Item
    First-Order Methods for Convex Optimization
    (Amsterdam : Elsevier, 2021) Dvurechensky, Pavel; Shtern, Shimrit; Staudigl, Mathias
    First-order methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories reported in various applications, including most importantly machine learning, signal processing, imaging and control theory. First-order methods have the potential to provide low accuracy solutions at low computational complexity which makes them an attractive set of tools in large-scale optimization problems. In this survey, we cover a number of key developments in gradient-based optimization methods. This includes non-Euclidean extensions of the classical proximal gradient method, and its accelerated versions. Additionally we survey recent developments within the class of projection-free methods, and proximal versions of primal-dual schemes. We give complete proofs for various key results, and highlight the unifying aspects of several optimization algorithms.
  • Item
    Accelerated variance-reduced methods for saddle-point problems
    (Amsterdam : Elsevier, 2022) Borodich, Ekaterina; Tominin, Vladislav; Tominin, Yaroslav; Kovalev, Dmitry; Gasnikov, Alexander; Dvurechensky, Pavel
    We consider composite minimax optimization problems where the goal is to find a saddle-point of a large sum of non-bilinear objective functions augmented by simple composite regularizers for the primal and dual variables. For such problems, under the average-smoothness assumption, we propose accelerated stochastic variance-reduced algorithms with optimal up to logarithmic factors complexity bounds. In particular, we consider strongly-convex-strongly-concave, convex-strongly-concave, and convex-concave objectives. To the best of our knowledge, these are the first nearly-optimal algorithms for this setting.
  • Item
    Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization
    (Amsterdam : Elsevier, 2022) Dvurechensky, Pavel; Kamzolov, Dmitry; Lukashevich, Aleksandr; Lee, Soomin; Ordentlich, Erik; Uribe, César A.; Gasnikov, Alexander
    Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million.
  • Item
    Generalized self-concordant analysis of Frank–Wolfe algorithms
    (Berlin ; Heidelberg : Springer, 2022) Dvurechensky, Pavel; Safin, Kamil; Shtern, Shimrit; Staudigl, Mathias
    Projection-free optimization via different variants of the Frank–Wolfe method has become one of the cornerstones of large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex, making them a challenging class of functions for first-order methods. Indeed, in a number of applications, such as inverse covariance estimation or distance-weighted discrimination problems in binary classification, the loss is given by a generalized self-concordant function having potentially unbounded curvature. For such problems projection-free minimization methods have no theoretical convergence guarantee. This paper closes this apparent gap in the literature by developing provably convergent Frank–Wolfe algorithms with standard O(1/k) convergence rate guarantees. Based on these new insights, we show how these sublinearly convergent methods can be accelerated to yield linearly convergent projection-free methods, by either relying on the availability of a local liner minimization oracle, or a suitable modification of the away-step Frank–Wolfe method.
  • Item
    Alternating minimization methods for strongly convex optimization
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Tupitsa, Nazarii; Dvurechensky, Pavel; Gasnikov, Alexander
    We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Łojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method.
  • Item
    Tensor methods for strongly convex strongly concave saddle point problems and strongly monotone variational inequalities
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Ostroukhov, Petr; Kamalov, Rinat; Dvurechensky, Pavel; Gasnikov, Alexander
    In this paper we propose three tensor methods for strongly-convex-strongly-concave saddle point problems (SPP). The first method is based on the assumption of higher-order smoothness (the derivative of the order higher than 2 is Lipschitz-continuous) and achieves linear convergence rate. Under additional assumptions of first and second order smoothness of the objective we connect the first method with a locally superlinear converging algorithm in the literature and develop a second method with global convergence and local superlinear convergence. The third method is a modified version of the second method, but with the focus on making the gradient of the objective small. Since we treat SPP as a particular case of variational inequalities, we also propose two methods for strongly monotone variational inequalities with the same complexity as the described above.
  • Item
    Oracle complexity separation in convex optimization
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Ivanova, Anastasiya; Gasnikov, Alexander; Dvurechensky, Pavel; Dvinskikh, Darina; Tyurin, Alexander; Vorontsova, Evgeniya; Pasechnyuk, Dmitry
    Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.
  • Item
    On the optimal combination of tensor optimization methods
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Kamzolov, Dmitry; Gasnikov, Alexander; Dvurechensky, Pavel
    We consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.
  • Item
    On the complexity of approximating Wasserstein barycenter
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2019) Kroshnin, Alexey; Dvinskikh, Darina; Dvurechensky, Pavel; Gasnikov, Alexander; Tupitsa, Nazarii; Uribe, César A.
    We study the complexity of approximating Wassertein barycenter of discrete measures, or histograms by contrasting two alternative approaches, both using entropic regularization. We provide a novel analysis for our approach based on the Iterative Bregman Projections (IBP) algorithm to approximate the original non-regularized barycenter. We also get the complexity bound for alternative accelerated-gradient-descent-based approach and compare it with the bound obtained for IBP. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ", which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.
  • Item
    On accelerated alternating minimization
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Guminov, Sergey; Dvurechensky, Pavel; Gasnikov, Alexander
    Alternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradient-type methods in practice as it is free of the choice of the step-size and is adaptive to the local smoothness of the problem. We show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.