Towards Detecting Textual Plagiarism Using Machine Learning Methods
MetadataVis full innførsel
Textual plagiarism is passing off someone else’s text as your own. The current state of the art in plagiarism detection performs well, but often uses a series of manually determined thresholds of metrics in order to determine whether an author is guilty of performing plagiarism or not. These thresholds are optimized for a single data set and are not optimal for all situations or forms of plagiarism. The detection methodologies also require a professional familiar with the algorithms in order to be properly adjusted, due to their complexity. Using a pre-classified data set, machine learning methods allow teachers and censors without knowledge of the methodology to use a plagiarism detection tool specifically designed for their needs. This thesis demonstrates that a methodology using machine learning, without the need to set thresholds, can match, and in some cases surpass, the top methodologies in the current state of the art. With more work, future methodologies may possibly outperform both the best commercial and freely available methodologies.
Masteroppgave informasjons- og kommunikasjonsteknologi - Universitetet i Agder, 2015