Clustering scientific publications: lessons learned through experiments with a real citation network

dc.bibliographicCitation.seriesTitleZIB Report ; 2025,05
dc.contributor.authorVu, Thi Huong
dc.contributor.authorKoch, Thorsten
dc.date.accessioned2025-09-08T12:38:09Z
dc.date.available2025-09-08T12:38:09Z
dc.date.issued2025-06
dc.description.abstractClustering scientific publications can reveal underlying research structures within bibliographic databases. Graph-based clustering methods, such as spectral, Louvain, and Leiden algorithms, are frequently utilized due to their capacity to effectively model citation networks. However, their performance may degrade when applied to real-world data. This study evaluates the performance of these clustering algorithms on a citation graph comprising approx. 700,000 papers and 4.6 million citations extracted from Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks. Datei-Upload durch TIBger
dc.description.versionpublishedVersion
dc.identifier.urihttps://oa.tib.eu/renate/handle/123456789/22399
dc.identifier.urihttps://doi.org/10.34657/21416
dc.language.isoeng
dc.publisherHannover : Technische Informationsbibliothek
dc.relation.affiliationZuse Institute Berlin
dc.relation.doihttp://nbn-resolving.de/urn:nbn:de:0297-zib-100418
dc.rights.licenseEs gilt deutsches Urheberrecht. Das Werk bzw. der Inhalt darf zum eigenen Gebrauch kostenfrei heruntergeladen, konsumiert, gespeichert oder ausgedruckt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden. - German copyright law applies. The work or content may be downloaded, consumed, stored or printed for your own use but it may not be distributed via the internet or passed on to external parties.
dc.subject.ddc100
dc.titleClustering scientific publications: lessons learned through experiments with a real citation networkger
dc.typeReport
dc.typeText
dcterms.extent9 Seiten
tib.accessRightsopenAccess

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RO9118_2025_5.pdf
Size:
706.76 KB
Format:
Adobe Portable Document Format
Description: