Clustering as applied to a general practitioner's record
MetadataShow full item record
The electronic patient record is primarily used as a way for clinicians to remember what has happened during the care of a patient. The electronic record also introduces an additional possibility, namely the use of computer based methods for searching, extracting and interpreting data patterns from the patient data. Potentially, such methods can help to reveal undiscovered medical knowledge from the patient record. This project aims to evaluate the usefulness of applying clustering methods to the patient record. Two clustering tasks are designed and accomplished, one that considers clustering of ICPC codes and one that considers medical certificates. The clusterings are performed by use of hierarchical clustering and k-means clustering. Distance measures used for the experiments are Lift correlation, the Jaccard coefficient and the Euclidian distance. Three indices for clustering validation are implemented and tested, namely the Dunn index, the modified Hubert $Gamma$ index and the Davies-Bouldin index. The work also points to the importance of dimensionality reduction for high dimensional data, for which PCA is utilised. The strategies are evaluated according to what degree they retrieve well-known medical knowledge owing to the fact that a strategy that retrieves a high degree of well-known knowledge are more likely to identify unknown medical information compared to a strategy that retrieves a lower degree of known information. The experiments show that, for some of the methods, clusters are formed that represent interesting medical knowledge, which indicates that clustering of a general practitioner's record can potentially constitute a contribution to further medical research.