Browsing by Author "Gasnikov, Alexander"
Now showing 1 - 20 of 22
Results Per Page
Sort Options
- ItemAccelerated variance-reduced methods for saddle-point problems(Amsterdam : Elsevier, 2022) Borodich, Ekaterina; Tominin, Vladislav; Tominin, Yaroslav; Kovalev, Dmitry; Gasnikov, Alexander; Dvurechensky, PavelWe consider composite minimax optimization problems where the goal is to find a saddle-point of a large sum of non-bilinear objective functions augmented by simple composite regularizers for the primal and dual variables. For such problems, under the average-smoothness assumption, we propose accelerated stochastic variance-reduced algorithms with optimal up to logarithmic factors complexity bounds. In particular, we consider strongly-convex-strongly-concave, convex-strongly-concave, and convex-concave objectives. To the best of our knowledge, these are the first nearly-optimal algorithms for this setting.
- ItemAdaptive gradient descent for convex and non-convex stochastic optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2019) Ogaltsov, Aleksandr; Dvinskikh, Darina; Dvurechensky, Pavel; Gasnikov, Alexander; Spokoiny, VladimirIn this paper we propose several adaptive gradient methods for stochastic optimization. Our methods are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.
- ItemAdvances in low-memory subgradient optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Dvurechensky, Pavel; Gasnikov, Alexander; Nurminski, Evgeni; Stonyakin, FedorOne of the main goals in the development of non-smooth optimization is to cope with high dimensional problems by decomposition, duality or Lagrangian relaxation which greatly reduces the number of variables at the cost of worsening differentiability of objective or constraints. Small or medium dimensionality of resulting non-smooth problems allows to use bundle-type algorithms to achieve higher rates of convergence and obtain higher accuracy, which of course came at the cost of additional memory requirements, typically of the order of n2, where n is the number of variables of non-smooth problem. However with the rapid development of more and more sophisticated models in industry, economy, finance, et all such memory requirements are becoming too hard to satisfy. It raised the interest in subgradient-based low-memory algorithms and later developments in this area significantly improved over their early variants still preserving O(n) memory requirements. To review these developments this chapter is devoted to the black-box subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of N.Z. Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and non-smooth convex and quasi-convex optimization problems are briefly exposed in what follows to introduce to the relevant fundamentals of non-smooth optimization. Special attention in this section is given to the adaptive step-size policy which aims to attain lowest complexity bounds. Unfortunately the non-differentiability of objective function in convex optimization essentially slows down the theoretical low bounds for the rate of convergence in subgradient optimization compared to the smooth case but there are different modern techniques that allow to solve non-smooth convex optimization problems faster then dictate lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov Universal approach, and Legendre (saddle point) representation approach. The new results on Universal Mirror Prox algorithms represent the original parts of the survey. To demonstrate application of non-smooth convex optimization algorithms for solution of huge-scale extremal problems we consider convex optimization problems with non-smooth functional constraints and propose two adaptive Mirror Descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz-continuous convex objective and constraints. The advantages of application of this method to sparse Truss Topology Design problem are discussed in certain details. The second method can be applied for solution of convex and quasi-convex optimization problems and is optimal in a sense of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of non-smooth convex optimization.
- ItemAlternating minimization methods for strongly convex optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Tupitsa, Nazarii; Dvurechensky, Pavel; Gasnikov, AlexanderWe consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Łojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method.
- ItemConvex optimization with inexact gradients in Hilbert space and applications to elliptic inverse problems(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Matyukhin, Vladislav; Kabanikhin, Sergey; Shishlenin, Maxim; Novikov, Nikita; Vasin, Artem; Gasnikov, AlexanderIn this paper we propose the gradient descent type methods to solve convex optimization problems in Hilbert space. We apply it to solve ill-posed Cauchy problem for Poisson equation and make a comparative analysis with Landweber iteration and steepest descent method. The theoretical novelty of the paper consists in the developing of new stopping rule for accelerated gradient methods with inexact gradient (additive noise). Note that up to the moment of stopping the method ``doesn't feel the noise''. But after this moment the noise start to accumulate and the quality of the solution becomes worse for further iterations.
- ItemDistributed optimization with quantization for computing Wasserstein barycenters(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Krawchenko, Roman; Uribe, César A.; Gasnikov, Alexander; Dvurechensky, PavelWe study the problem of the decentralized computation of entropy-regularized semi-discrete Wasserstein barycenters over a network. Building upon recent primal-dual approaches, we propose a sampling gradient quantization scheme that allows efficient communication and computation of approximate barycenters where the factor distributions are stored distributedly on arbitrary networks. The communication and algorithmic complexity of the proposed algorithm are shown, with explicit dependency on the size of the support, the number of distributions, and the desired accuracy. Numerical results validate our algorithmic analysis.
- ItemFlexible modification of Gauss--Newton method and its stochastic extension(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Yudin, Nikita; Gasnikov, AlexanderThis work presents a novel version of recently developed Gauss--Newton method for solving systems of nonlinear equations, based on upper bound of solution residual and quadratic regularization ideas. We obtained for such method global convergence bounds and under natural non-degeneracy assumptions we present local quadratic convergence results. We developed stochastic optimization algorithms for presented Gauss--Newton method and justified sub-linear and linear convergence rates for these algorithms using weak growth condition (WGC) and Polyak--Lojasiewicz (PL) inequality. We show that Gauss--Newton method in stochastic setting can effectively find solution under WGC and PL condition matching convergence rate of the deterministic optimization method. The suggested method unifies most practically used Gauss--Newton method modifications and can easily interpolate between them providing flexible and convenient method easily implementable using standard techniques of convex optimization.
- ItemGradient methods for problems with inexact model of the objective(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Stonyakin, Fedor; Dvinskikh, Darina; Dvurechensky, Pavel; Kroshnin, Alexey; Kuznetsova, Olesya; Agafonov, Artem; Gasnikov, Alexander; Tyurin, Alexander; Uribe, Cesar A.; Pasechnyuk, Dmitry; Artamonov, SergeiWe consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [19] and relative smoothness condition [43]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [49]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.
- ItemHyperfast second-order local solvers for efficient statistically preconditioned distributed optimization(Amsterdam : Elsevier, 2022) Dvurechensky, Pavel; Kamzolov, Dmitry; Lukashevich, Aleksandr; Lee, Soomin; Ordentlich, Erik; Uribe, César A.; Gasnikov, AlexanderStatistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million.
- ItemInexact model: A framework for optimization and variational inequalities(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Stonyakin, Fedor; Gasnikov, Alexander; Tyurin, Alexander; Pasechnyuk, Dmitry; Agafonov, Artem; Dvurechensky, Pavel; Dvinskikh, Darina; Piskunova, VictoryaIn this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.
- ItemInexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Stonyakin, Fedor; Gasnikov, Alexander; Tyurin, Alexander; Pasechnyuk, Dmitry; Agafonov, Artem; Dvurechensky, Pavel; Dvinskikh, Darina; Artamonov, Sergei; Piskunova, VictoryaIn this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal conditional gradient method and universal method for variational inequalities with composite structure. These method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for VIs with such operator. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities.
- ItemInexact tensor methods and their application to stochastic convex optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Agafonov, Artem; Kamzolov, Dmitry; Dvurechensky, Pavel; Gasnikov, AlexanderWe propose a general non-accelerated tensor method under inexact information on higher- order derivatives, analyze its convergence rate, and provide sufficient conditions for this method to have similar complexity as the exact tensor method. As a corollary, we propose the first stochastic tensor method for convex optimization and obtain sufficient mini-batch sizes for each derivative.
- ItemNear-optimal tensor methods for minimizing gradient norm(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Dvurechensky, Pavel; Gasnikov, Alexander; Ostroukhov, Petr; Uribe, A. Cesar; Ivanova, AnastasiyaMotivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding approximate stationary points, i.e. points with the norm of the objective gradient less than small error, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the methods presented in the same paper do not have optimal complexity bounds. We propose two optimal up to logarithmic factors methods with complexity bounds with respect to the initial objective residual and the distance between the starting point and solution respectively
- ItemOn accelerated alternating minimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Guminov, Sergey; Dvurechensky, Pavel; Gasnikov, AlexanderAlternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradient-type methods in practice as it is free of the choice of the step-size and is adaptive to the local smoothness of the problem. We show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.
- ItemOn primal and dual approaches for distributed stochastic convex optimization over networks(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Dvinskikh, Darina; Gorbunov, Eduard; Gasnikov, Alexander; Dvurechensky, Alexander; Uribe, César A.We introduce a primal-dual stochastic gradient oracle method for distributed convex optimization problems over networks. We show that the proposed method is optimal in terms of communication steps. Additionally, we propose a new analysis method for the rate of convergence in terms of duality gap and probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the optimal point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution.
- ItemOn the complexity of approximating Wasserstein barycenter(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2019) Kroshnin, Alexey; Dvinskikh, Darina; Dvurechensky, Pavel; Gasnikov, Alexander; Tupitsa, Nazarii; Uribe, César A.We study the complexity of approximating Wassertein barycenter of discrete measures, or histograms by contrasting two alternative approaches, both using entropic regularization. We provide a novel analysis for our approach based on the Iterative Bregman Projections (IBP) algorithm to approximate the original non-regularized barycenter. We also get the complexity bound for alternative accelerated-gradient-descent-based approach and compare it with the bound obtained for IBP. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ", which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.
- ItemOn the optimal combination of tensor optimization methods(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Kamzolov, Dmitry; Gasnikov, Alexander; Dvurechensky, PavelWe consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.
- ItemOptimal decentralized distributed algorithms for stochastic convex optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Gorbunov, Eduard; Dvinskikh, Darina; Gasnikov, AlexanderWe consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in smooth convex case with deterministic first-order oracle. We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. R-RRMA-AC-SA is an accelerated stochastic method based on the restarts of RRMA-AC-SA and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively. Finally, we apply all aforementioned results and approaches to solve decentralized distributed optimization problem and discuss optimality of the obtained results in terms of communication rounds and number of oracle calls per node.
- ItemOracle complexity separation in convex optimization(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2020) Ivanova, Anastasiya; Gasnikov, Alexander; Dvurechensky, Pavel; Dvinskikh, Darina; Tyurin, Alexander; Vorontsova, Evgeniya; Pasechnyuk, DmitryUbiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.
- ItemStopping rules for accelerated gradient methods with additive noise in gradient(Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Vasin, Artem; Gasnikov, Alexander; Spokoiny, VladimirIn this article, we investigate an accelerated first-order method, namely, the method of similar triangles, which is optimal in the class of convex (strongly convex) problems with a Lipschitz gradient. The paper considers a model of additive noise in a gradient and a Euclidean prox- structure for not necessarily bounded sets. Convergence estimates are obtained in the case of strong convexity and its absence, and a stopping criterion is proposed for not strongly convex problems.