Semi-supervised novelty detection

dc.bibliographicCitation.seriesTitleWIAS Preprintseng
dc.bibliographicCitation.volume1471
dc.contributor.authorBlanchard, Gilles
dc.contributor.authorLee, Gyemin
dc.contributor.authorScott, Clayton
dc.date.accessioned2016-03-24T17:38:34Z
dc.date.available2019-06-28T08:04:40Z
dc.date.issued2009
dc.description.abstractA common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard $p$-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded $p$-values, the general SSND framework allows for adaptation to arbitrary alternative distributions.
dc.description.versionpublishedVersioneng
dc.formatapplication/pdf
dc.identifier.issn0946-8633
dc.identifier.urihttps://doi.org/10.34657/3147
dc.identifier.urihttps://oa.tib.eu/renate/handle/123456789/2217
dc.language.isoengeng
dc.publisherBerlin : Weierstraß-Institut für Angewandte Analysis und Stochastik
dc.relation.issn0946-8633eng
dc.rights.licenseDieses Dokument darf im Rahmen von § 53 UrhG zum eigenen Gebrauch kostenfrei heruntergeladen, gelesen, gespeichert und ausgedruckt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.ger
dc.rights.licenseThis document may be downloaded, read, stored and printed for your own use within the limits of § 53 UrhG but it may not be distributed via the internet or passed on to external parties.eng
dc.subject.ddc510
dc.subject.otherSemi-supervised learningeng
dc.subject.othernovelty detectioneng
dc.subject.otherNeyman-Pearson classificationeng
dc.subject.otherlearning reductioneng
dc.subject.othertwo-sample problemeng
dc.subject.othermultiple testingeng
dc.titleSemi-supervised novelty detection
dc.typeReporteng
dc.typeTexteng
tib.accessRightsopenAccesseng
wgl.contributorWIASeng
wgl.subjectMathematikeng
wgl.typeReport / Forschungsbericht / Arbeitspapiereng
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
664526039.pdf
Size:
290.12 KB
Format:
Adobe Portable Document Format
Description: