What company does my news article refer to? Tackling multiclass problems with topic modeling

dc.bibliographicCitation.seriesTitleWIAS Preprintseng
dc.bibliographicCitation.volume2621
dc.contributor.authorLübbering, Max
dc.contributor.authorKunkel, Julian
dc.contributor.authorFarrell, Patricio
dc.date.accessioned2022-06-23T14:30:45Z
dc.date.available2022-06-23T14:30:45Z
dc.date.issued2019
dc.description.abstractWhile it is technically trivial to search for the company name to predict the company a new article refers to, it often leads to incorrect results. In this article, we compare the two approaches bag-of-words with k-nearest neighbors and Latent Dirichlet Allocation with k-nearest neighbor by assessing their applicability for predicting the S&P 500 company which is mentioned in a business news article or press release. Both approaches are evaluated on a corpus of 13k documents containing 84% news articles and 16% press releases. While the bag-of-words approach yields accurate predictions, it is highly inefficient due to its gigantic feature space. The Latent Dirichlet Allocation approach, on the other hand, manages to achieve roughly the same prediction accuracy (0.58 instead of 0.62) but reduces the feature space by a factor of seven.eng
dc.description.versionpublishedVersioneng
dc.identifier.urihttps://oa.tib.eu/renate/handle/123456789/9205
dc.identifier.urihttps://doi.org/10.34657/8243
dc.language.isoeng
dc.publisherBerlin : Weierstraß-Institut für Angewandte Analysis und Stochastik
dc.relation.doihttps://doi.org/10.20347/WIAS.PREPRINT.2621
dc.relation.issn2198-5855
dc.rights.licenseThis document may be downloaded, read, stored and printed for your own use within the limits of § 53 UrhG but it may not be distributed via the internet or passed on to external parties.eng
dc.rights.licenseDieses Dokument darf im Rahmen von § 53 UrhG zum eigenen Gebrauch kostenfrei heruntergeladen, gelesen, gespeichert und ausgedruckt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.ger
dc.subject.ddc510
dc.subject.otherText classificationeng
dc.subject.otherlatent dirichlet allocationeng
dc.subject.otherKullback--Leibler divergenceeng
dc.subject.othercompany predictioneng
dc.subject.othernews articleseng
dc.titleWhat company does my news article refer to? Tackling multiclass problems with topic modelingeng
dc.typeReporteng
dc.typeTexteng
dcterms.extent11 S.
tib.accessRightsopenAccess
wgl.contributorWIAS
wgl.subjectMathematik
wgl.typeReport / Forschungsbericht / Arbeitspapier
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
wias_preprints_2621.pdf
Size:
364.21 KB
Format:
Adobe Portable Document Format
Description: