Application of gradient boosted trees to gender prediction based on motivations of master athletes
Walsh, J, Heazlewood, T & Climstein, M 2018, 'Application of gradient boosted trees to gender prediction based on motivations of master athletes', Model Assisted Statistics & Applications, vol. 13, no. 3, pp. 235-252.
Published version available from:
Gradient boosted decision trees are statistical learning ensemble methods that iteratively refit decision tree sub-models to residuals. The aim of this research was to apply gradient boosted decision trees and investigate their ability as statistical techniques to predict gender based upon psychological constructs measuring motivations to participate in masters sports. Comparison was made between previously published research utilizing logistic regression, discriminate function analysis, radial basis functions and multilayer perceptrons with a selection of unboosted and boosted decision tree based models. The tree models selected were J48, C5.0, gradient boosted machine (GBM), XGBoost and LightGBM. The sample consisted of 3928 masters athletes (2010 males) from the World Masters Games, the largest sporting event in the world (by participant numbers). The efficacy of tree based models for prediction in this environment was established with even baseline older implementations, giving higher prediction accuracy than any methods used in prior research. The highest predictive accuracy was achieved using GBM (0.7134), exceeding accuracies of models using XGBoost (0.7012) or LightGBM (0.6904). These two recent implementations of boosting may have given lower predictive accuracy than GBM due to the high dimensionality relative to the number of cases in the data.