Problem/issue detection and classification
MetadataVis full innførsel
This thesis investigates the possibility of using pattern recognition and machine learning to detect online forum posts containing descriptions of problems and issues. In addition we seek to further classify these posts as either informative or non-informative, depending the quality of their content. Our motivation for this research is the fact that more and more consumers are turning to the Internet when expressing opinions, seeking help or searching for advice about their purchases. As a producer, awareness of this online “word-of-mouth” has become a key factor in brand and reputation control and when dealing with product recall management. However, over the last years, the amount of such online buzz has transcended the point where it’s manually manageable. It’s simply not possible to survey thousands of online forums by hand anymore. We therefore need systems able to automatically monitor these web sites and detect posts of interest, such as people having problems with a product. In the thesis, we implement and examine three different algorithms, Naive Bayes, k Nearest Neighbor and Self-Organizing Maps and determine what parameters and features produce the highest accuracy. The prototype is trained and validated using forum posts, but its application is not limited to this type of documents. The contribution of the thesis is to the areas of natural language text processing and classification.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstad