We describe an application of machine learning to the problem of predicting preterm birth. We conduct a secondary analysis on a clinical trial dataset collected by the National In- stitute of Child Health and Human Development (NICHD) while focusing our attention on predicting different classes of preterm birth. We compare three approaches for deriving predictive models: a support vector machine (SVM) approach with linear and non-linear kernels, logistic regression with different model selection along with a model based on decision rules prescribed by physician experts for prediction of preterm birth. Our approach highlights the pre-processing methods applied to handle the inherent dynamics, noise and gaps in the data and describe techniques used to handle skewed class distributions. Empirical experiments demonstrate significant improvement in predicting preterm birth compared to past work.
Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA. In this revision, we updated page 4 by adding the reference Vovsha et al. (2013) (incorrectly referenced as XXX in the previous version due to double blind reviewing). The bibtex entry is now added to the references