Latent Class Cluster Analysis: Selecting the number of clusters

Lezhnina, Olga; Kismihók, Gábor

doi:http://dx.doi.org/10.34657/9151

Latent Class Cluster Analysis: Selecting the number of clusters

Date

2022

Authors

Lezhnina, Olga

Kismihók, Gábor

Volume

9

Journal

MethodsX

Publisher

Amsterdam [u.a.] : Elsevier

Link to publishers version

https://doi.org/10.1016/j.mex.2022.101747

Abstract

Latent Class Cluster Analysis (LCCA) is an advanced model-based clustering method, which is increasingly used in social, psychological, and educational research. Selecting the number of clusters in LCCA is a challenging task involving inevitable subjectivity of analytical choices. Researchers often rely excessively on fit indices, as model fit is the main selection criterion in model-based clustering; it was shown, however, that a wider spectrum of criteria needs to be taken into account. In this paper, we suggest an extended analytical strategy for selecting the number of clusters in LCCA based on model fit, cluster separation, and stability of partitions. The suggested procedure is illustrated on simulated data and a real world dataset from the International Computer and Information Literacy Study (ICILS) 2018. For the latter, we provide an example of end-to-end LCCA including data preprocessing. The researcher can use our R script to conduct LCCA in a few easily reproducible steps, or implement the strategy with any other software suitable for clustering. We show that the extended strategy, in comparison to fit indices-based strategy, facilitates the selection of more stable and well-separated clusters in the data. • The suggested strategy aids researchers to select the number of clusters in LCCA • It is based on model fit, cluster separation, and stability of partitions • The strategy is useful for finding separable generalizable clusters in the data.