Parametric quantile regression models for continuous proportions data : an evaluation of mean versus median modeling beyond beta
dc.contributor.advisor | Burger, Divan A. | |
dc.contributor.email | u16001223@tuks.co.za | en_US |
dc.contributor.postgraduate | Weideman, Maricelle | |
dc.date.accessioned | 2025-02-12T12:50:40Z | |
dc.date.available | 2025-02-12T12:50:40Z | |
dc.date.created | 2025-04 | |
dc.date.issued | 2025-02 | |
dc.description | Mini Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025. | en_US |
dc.description.abstract | In the modeling of bounded data, outliers or influential values present unique challenges, particularly when dealing with continuous proportions data. Traditional models, such as the beta regression model, although widely adopted, lack robustness against outliers, thus motivating the need for alternative models capable of addressing these limitations. This dissertation provides a comprehensive evaluation of various parametric models, including the beta, beta rectangular, Kumaraswamy, and Johnson-t models, emphasizing their robustness in handling outliers. A simulation study was conducted to examine the performance of each model under scenarios with and without outliers, measuring bias and coverage for key parameters. Results confirm the beta regression model’s sensitivity to outliers, as evidenced by increased bias and reduced coverage when influential values were introduced. In contrast, the Johnson-t regression model maintained stability in both bias and coverage, demonstrating greater resilience in outlier-inclusive datasets. Application to the Australian Institute of Sport data set further validated these findings, highlighting the Johnson-t model’s effectiveness in achieving robust median regression compared to mean-based approaches, which were less reliable with outliers. This study concludes that while beta regression remains popular for bounded data, the Johnson-t regression model offers a preferable alternative due to its robustness in median modeling, a critical factor in data analysis where influential values cannot be ignored. | en_US |
dc.description.availability | Unrestricted | en_US |
dc.description.degree | MSc (Advanced Data Analytics) | en_US |
dc.description.department | Statistics | en_US |
dc.description.faculty | Faculty of Natural and Agricultural Sciences | en_US |
dc.description.sdg | SDG-09: Industry, innovation and infrastructure | en_US |
dc.identifier.citation | * | en_US |
dc.identifier.doi | https://cran.r-project.org/web/packages/DAAG/DAAG.pdf | en_US |
dc.identifier.other | A2025 | en_US |
dc.identifier.uri | http://hdl.handle.net/2263/100786 | |
dc.language.iso | en_US | en_US |
dc.publisher | University of Pretoria | |
dc.rights | © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. | |
dc.subject | UCTD | en_US |
dc.subject | Sustainable Development Goals (SDGs) | en_US |
dc.subject | Bounded data | en_US |
dc.subject | Outliers | en_US |
dc.subject | Robustness | en_US |
dc.subject | Median regression | en_US |
dc.subject | Mean-based models | en_US |
dc.subject | Parametric models | en_US |
dc.title | Parametric quantile regression models for continuous proportions data : an evaluation of mean versus median modeling beyond beta | en_US |
dc.type | Mini Dissertation | en_US |