A non-deterministic tokeniser for finite-state parsing

Jean-Pierre Chanod (Rank Xerox) & Pasi Tapanainen (University of Helsinki)

This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a dedicated lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.

PDF version (3 pages, 94k)