• English
    • norsk
  • norsk 
    • English
    • norsk
  • Logg inn
Vis innførsel 
  •   Alle institusjoner
  • Universitetet i Agder
  • AURA - Agder University Research Archive
  • Faculty of Engineering and Science
  • Department of Information- and Communication Technology
  • Scientific Publications in Information and Communication Technology
  • Vis innførsel
  •   Alle institusjoner
  • Universitetet i Agder
  • AURA - Agder University Research Archive
  • Faculty of Engineering and Science
  • Department of Information- and Communication Technology
  • Scientific Publications in Information and Communication Technology
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Language Detection and Tracking in Multilingual Documents Using Weak Estimators

Stensby, Aleksander; Oommen, B. John; Granmo, Ole-Christoffer
Chapter, Peer reviewed
Thumbnail
Åpne
Oommen_2010_Language.pdf (206.4Kb)
Permanent lenke
http://hdl.handle.net/11250/137799
Utgivelsesdato
2010
Del
Metadata
Vis full innførsel
Samlinger
  • Scientific Publications in Information and Communication Technology [268]
Originalversjon
Stensby, A., Oommen, B.J., & Granmo, O.-C. (2010). Lecture Notes in Computer Science, Volume 6218/2010, 600-609, DOI: 10.1007/978-3-642-14980-1_59  
Sammendrag
This paper deals with the extremely complicated problem of language detection and tracking in real-life electronic (for example, in Word-of-Mouth (WoM)) applications, where various segments of the text are written in different languages. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the next starts. Further, the features which one uses for any one language (for example, the n-grams) will not be valid to recognize another. Finally, and most importantly, in most real-life applications, such as in WoM, the fragments of text available before the switching, are so small that it renders any meaningful classification using traditional estimation methods almost meaningless. Earlier, the authors of [10] had recommended that for a variety of problems, the use of strong estimators (i.e., estimators that converge with probability 1) is sub-optimal. In this vein, we propose to solve the current problem using novel estimators that are pertinent for non-stationary environments. The classification results which involve as many as 8 languages demonstrates that our proposed methodology is both powerful and efficient.
Beskrivelse
Published version of an article from the book: Structural, Syntactic, and Statistical Pattern Recognition . The original publication is available at Spingerlink. http://dx.doi.org/DOI: 10.1007/978-3-642-14980-1_59
Utgiver
Springer

Kontakt oss

Søk i NORA
Basert på DSpace software

Levert av BIBSYS
 

 

Bla i denne samlingenUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifterBla i alle arkivArkiv og samlingerUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifter

Min side

Logg inn

Statistikk

Google Analytics statistikkBesøksstatistikk

Kontakt oss

Søk i NORA
Basert på DSpace software

Levert av BIBSYS