Search Results

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Item

In search on non-Gaussian components of a high-dimensional distribution

2006, Blanchard, Gilles, Kawanabe, Motoaki, Sugiyama, Masashi, Spokoiny, Vladimir, Müller, Klaus-Robert

Finding non-Gaussian components of high-dimensional data is an important preprocessing step for efficient information processing. This article proposes a new em linear method to identify the "non-Gaussian subspace'' within a very general semi-parametric framework. Our proposed method, called NGCA (Non-Gaussian Component Analysis), is essentially based on the fact that we can construct a linear operator which, to any arbitrary nonlinear (smooth) function, associates a vector which belongs to the low dimensional non-Gaussian target subspace up to an estimation error. By applying this operator to a family of different nonlinear functions, one obtains a family of different vectors lying in a vicinity of the target space. As a final step, the target space itself is estimated by applying PCA to this family of vectors. We show that this procedure is consistent in the sense that the estimaton error tends to zero at a parametric rate, uniformly over the family, Numerical examples demonstrate the usefulness of our method

Loading...
Thumbnail Image
Item

The degrees of freedom of partial least squares regression

2010, Krämer, Nicole, Sugiyama, Masashi

The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate that the Degrees of Freedom are useful for model selection. Our experiments indicate that the model complexity based on the Degrees of Freedom estimate is lower than the model complexity of the naive approach. In terms of prediction accuracy, both methods obtain the same accuracy as cross-validation