Spelling check is a common task in every written language, which is an automatic mechanism to detect and correct human errors. However, spelling check in Chinese is very different from that in English or other alphabetical languages. There are no word delimiters between words and the length of each word is very short: usually one to three characters. Therefore, error detection is a hard problem; it must be done within a context, say a sentence or a long phrase with a certain meaning, and cannot be done within one word. Once an error is identified, it is possible to correct the error since most of the errors are phonologically similar or visually similar characters. There are several previous works addressing the spelling check problem. Till now, there is no commonly available data set for spelling check in Chinese. The goal of this task is to provide a common evaluation data set so that application developers can compare their error detection and correction rates.
In this bake-off, the evaluation includes two sub-tasks: error detection and error correction. The errors are collected from students’ written essays. Since there are less than 2 errors per essay, in this bake-off the distribution of incorrect characters will match the real world error distribution in the sub-task one. The first sub-task focuses on the evaluation of error detection. The input sentences might consist of no error to evaluate the false-alarm rate of a system. The second sub-task focuses on the evaluation of error correction. Each sentence includes at least one error. The ability to accomplish these two sub-tasks is the complete function of a spelling checker. The task attendants may submit their results for only one of the sub-tasks or both.
Please visit the website for the bake-off for details.