LATENT SEMANTIC INDEXING FOR SIMILARITY JUDGEMENT IN TEXT RETRIEVAL
MetadataVis full innførsel
The growing number of electronic textual documents has created the need for effective information retrieval system which attempt to satisfy the user's information need by somehow interpreting the content of the information items in a collection and ranking them according to the degree of their relevance to the user query. Latent Semantic Indexing is a well-known information retrieval method for its ability to identify the meaning of words according to the context, dealing successfully with synonymy. Retrieval experiments indicate substantial performance gains over direct term matching methods while the underlying reasons that make the method work are not clear enough. In this thesis work I researched the capability of LSI to discover hidden similarities in capturing co-occurrence relationships between terms. The aim is to get good understanding of the working of LSI. The idea of using NLP techniques to augment LSI with syntactical information is also another interesting aspect that is included in this thesis. Especially crafted documents have been taken and experiments have also been done to clearly illustrate how LSI method works.