View on GitHub

KLM-corpus

Korean L2 Learner Morpheme Corpus

Annotation guidelines

This page includes the major cases that annotators discussed while building the KLM corpus, with the purpose of consistent annotations and better evaluation of morpheme tokenizers/taggers of interest.

Causative and passive markers

Voice markers such as -i/hi/li/ki/wu/kwu/chwu- (morphological causative) and -i/hi/li/ki- (suffixal passive) indicate causative and passive voices. They attach to a root to form causative or passive verbs, altering the number of arguments a predicate controls in a clausal construction. We parsed these morphemes and assigned a XSV tag.

Auxiliary verbs

Verbs like iss- “to be/exist/have”, ha- “to do”, and toy- “to become” can act as both main and auxiliary verbs. As main verbs, they represent concepts of existence, activity, or possession. In these cases, we assigned a VV tag.

As auxiliary verbs, they work with a main verb to convey grammatical meanings, such as continuous or progressive actions. In these instances, we assigned a VX tag.

Copula (Positive)

The copula (-i) links the subject of a sentence with a predicate, often conveying a positive meaning (VCP). Parsing complexities arise when the copula combines with the ending -lanun. This combination links the subject of a sentence to a noun or descriptive phrase, adding a specification, identification, or definition nuance.

advmod vs. obl

Spelling errors

Instead of subjectively interpreting misspelled words, we assigned three tags from the Sejong tag set: NA (Undefined), NF (Undefined, but considered a noun), and NV (Undefined, but considered a verb).

* We will update more detailed guidelines/examples for other morpheme categories shortly!