Search Results

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Item

Oracle complexity separation in convex optimization

2020, Ivanova, Anastasiya, Gasnikov, Alexander, Dvurechensky, Pavel, Dvinskikh, Darina, Tyurin, Alexander, Vorontsova, Evgeniya, Pasechnyuk, Dmitry

Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.

Loading...
Thumbnail Image
Item

Near-optimal tensor methods for minimizing gradient norm

2020, Dvurechensky, Pavel, Gasnikov, Alexander, Ostroukhov, Petr, Uribe, A. Cesar, Ivanova, Anastasiya

Motivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding approximate stationary points, i.e. points with the norm of the objective gradient less than small error, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the methods presented in the same paper do not have optimal complexity bounds. We propose two optimal up to logarithmic factors methods with complexity bounds with respect to the initial objective residual and the distance between the starting point and solution respectively