Human-Computer Collaboration for Faster Document Comparison
MetadataVis full innførsel
The amount of data in the world is increasing rapidly. One common operation on large datasets is one-to-all comparison, where the goal is to compare one object to all the other objects in the set. This thesis investigates the possibility of approaching the problem using human-computer collaboration. To this end, the human-guided filtering model (HGFM) is proposed. The model provides a general framework for one-to-all comparison, where cluster analysis is used to group similar objects together. Through the help of a human domain expert, the contents of irrelevant clusters can be removed from the process. An implementation of the model is demonstrated, and tested over a series of experiments. During these experiments, it is shown that the model can reduce the size of the dataset with up to 80 % before comparison takes place, creating ample opportunity for saving both time and computational resources. In light of the model's apparent potential, several directions for future research is proposed at the end of the thesis.