Programming Languages White Papers
Improved Source-Channel Models for Chinese Word Segmentation
Overview This paper presents a Chinese word segmentation system that uses improved source- channel models of Chinese sentence generation. Chinese words are defined as one of the following four types: lexicon words, morphologically derived words, factoids, and named entities. The system provides a unified approach to the four fundamental features of word-level Chinese language processing: word segmentation, morphological analysis, factoid detection, and named entity recognition. The performance of the system is evaluated on a manually annotated test set, and is also compared with several state-of-the-art systems, taking into account the fact that the definition of Chinese words often varies from system to system.
| Publisher | Microsoft | File Format | |
|---|---|---|---|
| Date Published | May 2003 | ||
| Format | White Papers | ||
| Topics | |||



