UPSpace will be temporarily unavailable on Sunday, 5 October 2025 between 11:00 and 13:00 (South African Time) due to scheduled maintenance. We apologise for any inconvenience this may cause and appreciate your understanding
 

A new look at the dirichlet distribution : robustness, clustering, and both together

dc.contributor.authorTomarchio, Salvatore D.
dc.contributor.authorPunzo, Antonio
dc.contributor.authorFerreira, Johannes Theodorus
dc.contributor.authorBekker, Andriette, 1958-
dc.date.accessioned2025-09-18T09:51:06Z
dc.date.available2025-09-18T09:51:06Z
dc.date.issued2025-03
dc.descriptionDATA AVAILABILITY : The real datasets used in this manuscript are publicly available as described in the manuscript. CHANGE HISTORY : 24 November 2024. Missing Open Access funding information has been added in the Funding Note.
dc.description.abstractCompositional data have peculiar characteristics that pose significant challenges to traditional statistical methods and models. Within this framework, we use a convenient mode parametrized Dirichlet distribution across multiple fields of statistics. In particular, we propose finite mixtures of unimodal Dirichlet (UD) distributions for model-based clustering and classification. Then, we introduce the contaminated UD (CUD) distribution, a heavy-tailed generalization of the UD distribution that allows for a more flexible tail behavior in the presence of atypical observations. Thirdly, we propose finite mixtures of CUD distributions to jointly account for the presence of clusters and atypical points in the data. Parameter estimation is carried out by directly maximizing the maximum likelihood or by using an expectation-maximization (EM) algorithm. Two analyses are conducted on simulated data to illustrate the effects of atypical observations on parameter estimation and data classification, and how our proposals address both aspects. Furthermore, two real datasets are investigated and the results obtained via our models are discussed.
dc.description.departmentStatistics
dc.description.departmentGeography, Geoinformatics and Meteorology
dc.description.librarianhj2025
dc.description.sdgSDG-04: Quality Education
dc.description.sponsorshipSupport of MUR; The SMILE project: Statistical Modelling and Inference to Live the Environment, funded by the European Union - Next Generation EU; the National Research Foundation (NRF) of South Africa (SA); the DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), South Africa and the Department of Research and Innovation at the University of Pretoria (SA). Open access funding provided by Università degli Studi di Catania within the CRUI-CARE Agreement.
dc.description.urihttps://link.springer.com/journal/357
dc.identifier.citationTomarchio, S.D., Punzo, A., Ferreira, J.T. et al. A New Look at the Dirichlet Distribution: Robustness, Clustering, and Both Together. Journal of Classification 42, 31–53 (2025). https://doi.org/10.1007/s00357-024-09480-4.
dc.identifier.issn0176-4268 (print)
dc.identifier.issn1432-1343 (online)
dc.identifier.other10.1007/s00357-024-09480-4
dc.identifier.urihttp://hdl.handle.net/2263/104389
dc.language.isoen
dc.publisherSpringer
dc.rights© The Author(s) 2024, corrected publication 2024. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License.
dc.subjectCompositional data
dc.subjectDirichlet distribution
dc.subjectMode
dc.subjectModel-based clustering
dc.subjectRobustness
dc.titleA new look at the dirichlet distribution : robustness, clustering, and both together
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tomarchio_New_2025.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: