Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives

Vu, Thi Huong; Litzel, Ida; Koch, Thorsten

doi:https://doi.org/10.34657/21412

Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives

Files

RO9118_2025-9.pdf (1.3 MB)

Date

2025-06

Authors

Vu, Thi Huong

Litzel, Ida

Koch, Thorsten

Series Titel

ZIB Report ; 2025,09

Publisher

Hannover : Technische Informationsbibliothek

Abstract

Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science – containing about 70 million articles and a billion citations – poses significant challenges. We analyze potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established, providing new theoretical insights, and practical solution methods are proposed by exploiting the problem’s structure. Specifically, we accelerate the gradient projection method using GPU-based parallel computing to efficiently handle large-scale data.

Publication Type

Report

Version

publishedVersion

URI

https://oa.tib.eu/renate/handle/123456789/22395
https://doi.org/10.34657/21412

Collections

Forschungsberichte ohne Pflichtabgabe (DFG, IGF…)

License

This document may be downloaded, read, stored and printed for your own use within the limits of § 53 UrhG but it may not be distributed via the internet or passed on to external parties.
Es gilt das deutsche Urheberrecht. Das Werk bzw. der Inhalt darf zum eigenen Gebrauch kostenfrei heruntergeladen, konsumiert, gespeichert oder ausgedruckt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.

Full item page

Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives

Files

Date

Authors

Editor

Advisor

Volume

Issue

Journal

Series Titel

Book Title

Publisher

Supplementary Material

Other Versions

Link to publishers' Version

Abstract

Description

Keywords

Keywords GND

Conference

Publication Type

Version

URI

Collections

License