Early warnings of critical diagnoses
MetadataShow full item record
A disease which is left untreated for a longer period is more likely to cause negative consequents for the patient. Even though the general practitioner is able to discover the disease quickly in most cases, there are patients who should have been discovered earlier. Electronic patient records store time-stamped health information about patients, recorded by the health personnel treating the patient. This makes it possible to do a retrospective analysis in order to determine whether there was sufficient information to give the diagnose earlier than the general practitioner actually did. Classification algorithms from the machine learning domain can utilise large collections of electronic patient records to build models which can predict whether a patient will get the disease or not. These models could be used to get more knowledge about these diseases and in a long-term perspective they could become a support for the general practitioner in daily practice. The purpose of this thesis is to design and implement a software system which can predict whether a patient will get a disease in the near future or not. The system should attempt to predict the disease before the general practitioner even suspects that the patient might have the disease. Further the objective is to use this system to identify warning signs which are used to make the predictions, and to analyse the usefulness of the predictions and the warning signs. The diseases asthma, diabetes 2 and hypothyroidism have been selected to be the test cases for our methodology. A set of suspicion-indicators which indicates that the general practitioner has suspected the disease are identified in an iterative process. These suspicion-indicators are subsequently used to limit the information available for the classification algorithms. This information is subsequently used to build prediction models, using different classification algoritms. The prediction models are evaluated in terms of various performance measures and the models themselves are analysed manually. Experiments are conducted in order to find favourable parameter values for the information extraction process. Because there are relatively few patients who have the disease test cases, the oversampling technique SMOTE is used to generate additional synthetical patients with the test cases. A set of suspicion-indicators has been identified in cooperation with domain experts. The availability of warning signs decreases as the information available for the classifier diminishes, while the performance of the classifiers is not affected to such a large degree. Applying the SMOTE oversampling technique improves the results for the prediction models. There is not much difference between the performance of the various classification algorithms. The improved problem formulation results in models which are more valid than before. A number of events which are used to predict the test cases have been identified, but their real-world importance remains to be evaluated by domain experts. The performance of the prediction models can be misguiding in terms of practical usefulness. SMOTE is a promising technique for generating additional data, but the evaluation techniques used here are not good enough to make any conclusions.