MoA's features

Date : 95.8.13 (Sun), bgjang@csone


Readible codes :

I think so, but the other, too ? I am not sure but I have efforts to write readible codes by the other one. If you look my codes and recognize my programming habits/styles, then it would not be difficult to fix my programs.

Generalized concepts :

There eixsts several difficult and troublesome part in MA. And I have no knowledge about this area, so I think much to generalize/simplify the problem although it require more system resources(time/memory).

Irregular processing with tables and dictionary :

Anybody can easily add irregular rule. See irrtbl.c for more info. And there eixsts a dictionary called I-Type dictionary which contains the words to which irregular rule must be applied.

Exceptional words, contraction recovery :

in MoA, exceptional word and contraction recovery to base-form are occured together. Anybody can update E-Type dictionary(exceptional word dictionary) which contains the exceptional words and contracted form of word and its base form.

Connectivity Infomation :

in MoA, the connectivity information between POS is used. And this table can be changed easily using bundled utility named chkcon. This utility and information about the connection between POS is stolen from KTS.

Using POS defined by Jae-Hoon, Lee :

Like KTS, MoA use POS(Part Of Speech) defined by Jae-Hoon, Kim. Refer to the Postscript file morph-model.ps for the detail description of the meanings of POS.

Hashing in Chart data structure :

MA using chart mainly searching the chart data structure, so I use Hashing(strictly, it is an modification of Hashing technique) to search some edge in chart.

Using Lex to tokenization :

So anybody can define token set easily and modify program according to that.

big dictionaries, sufficient information tables :

For N-Type dictionary, we use the entire Korean dictionary made by WHO(?). And for I-Type dictionary, corpus. Jae-Hoon, Kim. Irregular tables. Unknown word connectivity informations. etc.

dictionary management tool :

although it have no help.

Memory Cache for N-Type dictionary :

For this algorithm, there will be more N-Type dictionary(normal dictionary) access than the previous MA algorithm. So there must be some memory caching scheme for N-Type dictionary.

I devised a caching scheme which is modified from Hashing data structure and proper to Korean Language dictionary. Surely, I programmed it into MoA.

Using syllable information :

using syllable information when checking normal word in chart decrease the frequency to look the dictionary so the system efficieny will be increased.

Checking the end of word :

when checking the portion of irregulation : irregulation only occur in the end of word. So this informations decrease the number of chart generated by MA.

Unknown word handling :

This step is very important part of MA.
Problems of MoA Home of MoA
Byoung-Gyu, Chang / bgjang@csone.kaist.ac.kr