Identifying Geographic Terms within Natural Language Text
MetadataVis full innførsel
The huge amount of textual data available in digital form in today’s world increases the need for methods that facilitate ease of access and navigability. Automatic extraction of keywords from text bodies is one promising approach. However, the relevance of keywords are context dependent, and extracting relevant keywords often requires a semantic analysis, simply because words may have different meanings in different contexts. It is well-known that resolving such word sense ambiguity automatically can be very challenging. When the topic of interest is geographic information, important keywords would be geographic terms like countries, cities, counties and states. This thesis presents a probabilistic method for automatic identification of geographic terms within natural language text. The method uses a database of geographic terms to identify possible geographic entities. In contrast to state of the art, we resolve semantic ambiguity by using a Bayesian classifier that takes the context of ambiguous words into account. In our empirical results, we report a geographic term identification accuracy of 90%. We thus believe that the approach we present can be of importance for those working within the field of text analysis and data-mining, when accurate geographic term identification is of importance.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2008 – Universitetet i Agder, Grimstad