Recognizing Text Signatures Using Neural Machine Translation
MetadataVis full innførsel
Optical character recognition (OCR) is a technology used to convert scanned text into searchable data. OCR systems have achieved up to 99% recognition rates when working with clean and well-formatted documents under optimal conditions. However, the results are less promising under suboptimal conditions, for example when faced with damaged or obfuscated text. We propose a new method for recognizing words that are obfuscated in a particular way. This recognition is accomplished by utilizing their signature, a small portion of the original text. Our approach to this problem is to consider it as a translation problem, and we attempt to solve it by using state-of-the-art methods in the field of machine translation. Three models were developed as a result of the research conducted in this thesis. Two of these were based on the encoder-decoder framework for sequence-to-sequence prediction. The best performing model had an accuracy of over 98% when recognizing text written in a single font and close to 90% when recognizing text written in five different fonts under 10% noise.