Ontology guided financial knowledge extraction from semistructured information sources
MetadataVis full innførsel
Intermedium has an agent searching the Web for financial articles defined by certain criteria, for instance an industrial domain of interest. A portal service for reading and searching these articles, are available for the customers. The sources searched among are secondary sources, like online newspapers. Secondary sources publish information more frequently, and other information than can be found in annual reports etc, like predictions. Finding and comparing financial figures in the articles are often time consuming and hard to compare with each other. Having the financial figures, and what these applies for, presented in an application where information could be easy reviewed and compared, would apply valuable information for decision makers in bigger companies. Web documents are usually semi-structured, and therefore almost impossible to query for information. Only keyword searches are supported by the computers because of the lack of understanding. Advanced extraction processes of the information needs to be performed. This thesis evaluates an ontology guided approach for extracting financial information from semi-structured information sources. A financial ontology has been constructed based on an investigation of 50 articles gathered from Intermedium’s agent. Instances with synonyms, the words to extract from the text, and relations between the instances have been defined. The ontology language RDF has been chosen and used as ontology language through the entire thesis. A prototype application has been developed to perform the extraction process. Articles are loaded from XML files; words to extract from the text are found by query the ontology using the query language RDQL; NLP and NLTK are used to do the extraction based on the words found in the ontology; Velocity template is used to get the proper structure in the output files RDF and XBRL instance document. The ontology is providing the application with knowledge in the extraction process. When a synonym is found in one instance, a query for reference to other instances is performed, and synonyms of these instances are searched for in the text. If a text does not contain any interesting information, the application does not waste time with trying to match all words in the ontology with the ones in the text. The result is presented with semantic tagging in RDF syntax. A part of the information extracted is also shown as an example of how the financial standard XBRL can be given. The advantage of XBRL is that it can be used directly by supporting tools; RDF has to be processed by a more intelligent application. Financial information has in both these formats been added knowledge with computer processable semantic tagging.
Masteroppgave i informasjons- og kommunikasjonsteknologi 2003 - Høgskolen i Agder, Grimstad
UtgiverHøgskolen i Agder
Agder University College