Choice of techniques in data mining: Three experiments in applications
This thesis analyzes the relative strengths and weaknesses of different types of techniques in data mining. It uses one data set to test three classification techniques: decision trees, logistic regression, and neural nets, with commonly available software packages, and compares the performances and results of these techniques. The three experiments in the thesis demonstrate that both decision trees and the logistic regression are able to show which variables are the most important in predicting the outcome, whereas the neural network does not give such information. Both the logistic regression and the neural network can classify data based on the combination of variables, and give the user an overall measure of predictive power of the model, whereas decision trees cannot. The thesis also records that classification using neural networks tends to be more time consuming than the other two techniques in arriving at the best possible classification performance.