Discernibility and Rough Sets in Medicine: Tools and Applications
MetadataShow full item record
This thesis examines how discernibility-based methods can be equipped to posses several qualities that are needed for analyzing tabular medical data, and how these models can be evaluated according to current standard measures used in the health sciences. To this end, tools have been developed that make this possible, and some novel medical applications have been devised in which the tools are put to use. Rough set theory provides a framework in which discernibility-based methods can be formulated and interpreted, and also forms an appealing foundation for data mining and knowledge discovery. When the medical domain is targeted, several factors become important. This thesis examines some of these factors, and holds them up to the current state-of-the-art in discernibility-based empirical modelling. Bringing together pertinent techniques, suitable adaptations of relevant theory for model construction and assessment are presented. Rough set classifiers are brought together with ROC analysis, and it is outlined how attribute costs and semantics can enter the modelling process. ROSETTA, a comprehensive software system for conducting data analyses within the framework of rough set theory, has been developed. Under the hypothesis that the accessibility of such tools lowers the threshold for abstract ideas to migrate into concrete realization, this aids in reducing a gap between theoreticians and practitioners, and enables existing problems to be more easily attacked. The ROSETTA system boasts a set of flexible and powerful algorithms, and sets these in a user-friendly environment designed to support all phases of the discernibility-based modelling methodology. Researchers world-wide have already put the system to use in a wide variety of domains. By and large, discernibility-based data analysis can be varied along two main axes: Which objects in the universe of discourse that we deem it necessary to discern between, and how we define that discernibility among these objects is allowed to take place. Using ROSETTA, this thesis has explored various facets of this also in three novel and distinctly different medical applications: *A method is proposed for identifying population subgroups for which expensive tests may be avoided, and experiments with a real-world database on a cardiological prognostic problem suggest that significant savings are possible. * A method is proposed for anonymizing medical databases with sensitive contents via cell suppression, thus aiding to preserve patient confidentiality. * Very simple rule-based classifiers are employed to diagnose acute appendicitis, and their relative performance is compared to a team of experienced surgeons. The added value of certain biochemical tests is also demonstrated.