Phonetic Normalization for Scoring Misspellings
Phonetic Normalization for Scoring Misspellings
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
2
500 Count 50 ●
Unique responses
400 40
Unique tokens
●
300 30
●
●
●
● ●
●
200 20 ● ● ●
● ●
● ● ● ● ● ● ● ●
●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
100 10 ● ● ●
● ●
●
●
● ●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
0 0
.1 .2 .4 .5 .6 .7 .8 .1 .2 .3 .5 .6 .1 .2 .4 .5 1 0 .1 .2 .4 .5 .6 .7 .8 .1 .2 .3 .5 .6 .1 .2 .4 .5 1 0
_1 _1 _1 _1 _1 _1 _1 _2 _2 _2 _2 _2 _3 _3 _3 _3 _3. _1 _1 _1 _1 _1 _1 _1 _2 _2 _2 _2 _2 _3 _3 _3 _3 _3.
LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC
[Link] [Link]
Figure 1: Number of unique responses and unique Figure 2: Response lengths, in tokens, per ques-
tokens per question tion
Variable N
of responding to questions to the more difficult
Verbatim responses 7208
audio prompts) and for some questions it is low
Verbatim unique 3794
(only 29 responses to one of the questions after
Preprocessed unique 3146
preprocessing) for the analyses presented in this
Tokens 16298
paper, we selected only those questions to which
Token types 2429
we have at least 100 unique preprocessed re-
Table 1: Descriptive corpus information sponses. We moreover excluded questions which
elicited unordered multi-part responses, that is,
questions of the type “Name 3 . . . ” or “What
Preprocessing Certain minor form errors, such
are . . . ? (2 items)”. Our complete data set con-
as wrong capitalization or irregular punctuation,
sists of responses to 17 questions which elicited
are irrelevant while assessing comprehension. We
single-part responses and each response has been
exploit this in a scoring platform to reduce the
scored at 0, 0.5, or 1 points.
set of responses to score by normalizing spuri-
Table 1 shows basic descriptive information
ous writing mechanics differences which are not
about the corpus. The number of verbatim
considered score-affecting in assessing compre-
responses is the total number of responses to
hension. This includes lower-casing and remov-
the 17 questions before preprocessing. “Ver-
ing clause- and sentence-final punctuation. In or-
batim unique” is the number of token-identical
der to avoid differences in edit distance due to di-
verbatim responses collapsed to one observation.
acritics use, we also transcribe umlaut characters,
“Preprocessed unique” is the number of token-
using the standard convention, with their underly-
identical (unique) responses after preprocessing as
ing vowel followed by ‘e’ (‘ö’ as ‘oe’, ‘ü’ as ‘ue’,
described in the previous paragraph. “Tokens” and
etc.). Preprocessing reduces the set of responses
“Token types” are, respectively, the number of all
which teachers need to score by more than 50%
tokens and unique tokens (types) in the prepro-
for some items. For this study we use responses
cessed responses.
scored in the preprocessed form. For the analy-
In the remainder of this paper, we refer to the set
sis presented in this paper we use a subset of the
of preprocessed unique responses. Figure 1 shows
scored preprocessed responses selected as summa-
the distribution of responses and unique tokens per
rized below.
question for the three items (LC 1, LC 2, LC 3).
The corpus Since the number of responses dif- Figure 2 shows the distribution of response lengths
fers from question to question (at least partially per question. There are more unique responses to
due to different language proficiencies of the test- the more difficult items, LC 2 and LC 3, and the
takers; low-proficiency test-takers are not capable responses to those items are longer and more di-
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
3
LC 1.1 LC 1.6 LC 3.1
frankreisch austereich giespallampe
frankrich austerreich energiespaerlaempe
frankriech oestereicht energysparen
frankrreisch oeustreich energiesparenlampen
frankrreit ostreich energiesparlampel
franzoezisch oesterreisch energiesparer
franzuezisch oesttereich energiespannlampe Figure 4: Corpus processing
freinkreich oeustreich energisparelampen
frienkriesch oeschterich sparrlampen
frienricht oessterrisch energiespaerlaempe lenging. This may be partially due to the fact that
“Energiesparlampe” is a compound noun.
Figure 3: Examples of misspelled responses Even this small sample illustrates the large va-
riety of spelling errors, the high complexity of the
verse (the number of unique tokens larger than the spell-checking task, and the high demands on au-
number of unique responses, that is, fewer recur- tomated processing. Some misspellings, such as
ring words than in the easiest item, LC 1). The lampel for “lampe” or “lampen”, are probably ty-
average response length was 5 tokens. pos, while others are likely to have a phonolog-
ical source, like frankreisch or oesterreisch, and
Examples In order to illustrate the spelling er- among those some might be explained by inter-
rors problem, in Figure 3 we show examples of ference of another foreign or the native language
misspellings in responses to three questions which of the student, for instance “au” in austereich or
elicited simple one-word key concepts. We will “y” in energysparen. Some errors might be inter-
use responses to these questions in one of the preted as wrong morphological forms rather than
analyses (RAs below are reference answers pro- misspellings, e.g. energisparelampen. In many
vided by the teachers; vertical bar separates alter- cases multiple errors are combined.
natives):
LC 1.1 Wo wohnt Alexandra? 4 Spell-checking and Normalizations
‘Where does Alexandra live?’ As shown in Figure 4, data for analysis was pre-
RA: frankreich pared as follows: We created a spelling gold-
LC 1.6 Woher kommt Elisabeth? standard semi-automatically by spell-checking
preprocessed responses using an off-the-shelf
‘Where does Elisabeth come from?’
spell-checker (described in more details in Sec-
RA: oesterreich|wien|wien oesterreich tion 4.1) and then manually annotating (verify-
LC 3.1 Wie beleuchtet die Bundeskanzlerin An- ing and correcting) the checker’s outputs (Sec-
gela Merkel ihre Wohnung? tion 4.2). Each learner response and reference an-
‘How does Chancellor Angela Merkel swer was automatically transcribed into three dif-
light her apartment’ ferent phonetically-based encodings which, in the
context of the automated scoring task, we treat as
RA: energiesparlampen
spelling normalizations (Section 4.3). In the anal-
‘energy saving lamps’ ysis section we compare the spell-checked and the
Two of the questions (LC 1.1 and LC 1.6) ap- phonetically transcribed responses with, respec-
peared with the first, easiest, listening prompt. tively, the strings or the transcriptions of target
Even though identifying the answers within the hypotheses and reference answers. The methods
audio prompts was easy for most test-takers, also and tools used for annotation and normalization
low-proficiency, spelling the answers correctly are outlined below.
turned out to be challenging, even though the
elicited key concepts denote two well known Eu- 4.1 Spell-checking
ropean countries. The third question (LC 3.1) ap- For automated spell-checking and spelling cor-
peared with the last, most difficult, audio prompt rection we use Aspell (Atkinson, 2006), an open
and was answered by medium- to high-proficiency source spell-checker provided by GNU. Aspell
learners. Likewise here spelling the word is chal- supports multiple languages and is frequently
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
4
used as a reference system in research on spell- good-faith interpretation was impossible or bor-
checking and writing normalization. Crucially to derline possible, we marked those words as unin-
this work, a large dictionary for the German lan- terpretable (for instance, frankaise, freikeit, franch
guage compatible with Aspell is freely available, in response to LC 1.1 and oestech, busterish,
as are implementations of the system itself. As- uscraisch, or susthei in response to LC 1.6). We
pell is thus a good candidate for integration into a also marked foreign words explicitly (france, fran-
scoring system, and so a well-motivated choice for cais, austria) as some students answered in En-
an evaluation. glish or in their native language.
Aspell performs checking and suggests correc- The annotation was carried out by the authors
tions based on a combination of orthographic and of this paper. The corpus was divided into parts
phonetic coding, fast dictionary lookup, and an and single annotation was performed for each mis-
edit distance calculation. Alternative spellings are spelled word by one author. The manually cor-
identified by an algorithm which represents words rected spell-checker outputs are used as a spelling
by their orthographic forms and their “soundslike” gold standard. The spell-checked, annotated cor-
equivalents, that is, approximate pronunciations pus contains 2945 responses, 15260 tokens (2898
constructed based on phonetic information. Sug- unique responses, 2173 unique tokens).
gestions are ordered by a weighted average of the
edit distances between the candidate and the mis- 4.3 String Normalizations
spelled word and between the “soundslike” encod-
ings of the two words. Aspell language versions For this study we used three phonetically-based
differ in their dictionaries and phonetic data, but encodings: ASJP and Dolgopolsky’s systems, and
the underlying edit distance algorithm is the same. Soundex as baseline.
Note that Aspell performs context-insensitive
spell-checking, that is, individual words are pro- ASJPcode Automated Similarity Judgment Pro-
cessed in isolation. Thus, only non-word errors gram (ASJP) is a procedure originating from com-
are detected, while real-word errors are not. In this parative and historical linguistics developed with
study we do not address real-word errors, however, the view to comparing world languages by lexi-
we are planning to annotate the complete data set cal similarity (Wichmann et al., 2013). Compar-
manually in the future. isons are based on word lists encoded in standard-
ized orthography (ASJPcode), a simplified version
of the International Phonetic Alphabet (Interna-
4.2 Annotation
tional Phonetic Association, 1999). ASJP encod-
We annotated the learner responses with target ing consists of 41 symbols, 7 vowels and 34 con-
hypotheses (hypothesized intended forms) semi- sonants, which represent the commonly occurring
automatically using the Aspell checker. For each sounds of the world’s languages (for details, see
non-word Aspell searches its dictionary and pro- Appendix C of (Brown et al., 2008)). The tran-
vides a list of suggested replacements. To obtain scription employed in this study was specifically
a spell-checked corpus we processed our data set designed to capture the sound representations of
with Aspell and for each word which Aspell re- German.
ported as misspelled, we stored Aspell’s first sug-
gestion. Then, we manually checked the first sug- Dolgopolsky’s sound classes The sound class
gestions and corrected them were necessary. coding system of Dolgopolsky (1986) was devel-
As Figure 3 illustrates, the range of spelling oped in the context of research analogous to the
variants includes cases of questionable inter- ASJP project, that of identifying related language
pretation and acceptability; consider, for in- families. Dolgopolsky’s system groups similar
stance, frienricht or giespallampe as misspellings consonants into 10 “sound classes” in such way
of “frankreich” and “energiesparlampe”, respec- that phonetic regularities within a class are more
tively. When building the spelling gold standard systematic than between classes. Each class is rep-
we did not use the teachers’ scores as guides, resented with a single character. Vowels are sim-
but rather attempted to accept generously those ply marked as such (V). The transcription used in
words which could be in good faith interpreted to this study was also designed to capture the sound
be misspellings of the expected concepts. Where system of German.
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
5
String ASJP Dolgopolsky Soundex Valid Misspelled Row
frankeriech fGaNkeGiS PRVNKVRVS F652
frankfurt fGaNkfuGt PRVNKPVRT F652
words words totals
fraenkerisch fGaENkeGiS PRVVNKVRVS F652 Reported 42 1040 1082
fracraich fGakGaiS PRVKRVVS F626 Suggestions found 21 904 925
oestarreich 7oEstaGaiS HVVSTVRVVS O236
First Correct - 583 583
oestereisch 7oEsteGaiS HVVSTVRVVS O236
austerreich 7austEGaiS HVVSTVRVVS A236 First Wrong 21 321 342
austerreicht 7austEGaiSt HVVSTVRVVST A236 No Suggestions 21 136 157
Figure 5: Examples of normalizations Table 2: Performance of the Aspell spell-checker
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
6
Cosine−1−gram Cosine−2−gram Cosine−3−gram nDL Most of the remaining errors are due to con-
1.00 ● ●
text insensitivity; for instance, to “What did
●
●
Karl Marx do in Cologne?” (RA: “Leitung
der Neuen Rheinischen Zeitung” (‘Led the “New
● ●
● ● ● ●
0.75 ●
●
●
Rhinish Newspaper”’) a student wrote: radikal de-
mochratisch behzatung (‘radical democratic UN -
●
●
● ●
● ● ● ●
INTERPRETABLE ’) for which Aspell suggested
0.50 radikal demokratische Beratung (radical demo-
●
● ● ● ● ● ● ●
●
●
●
●
●
5.2 Diversity of Misspellings
●
0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
.1 .1 .1 .1 .1 .1 .1 .1 .1 .1 .1 .1
malized Damerau-Levenshtein distances (nDL) to
_1 _2 _3 _1 _2 _3 _1 _2 _3 _1 _2 _3
LC LC LC LC LC LC LC LC LC LC LC LC target hypotheses with linear trend lines. On the
[Link]
x-axis, items within distance measure groups are
ordered as in Figures 1 and 2. As can be seen in
Figure 6: Per item distribution of distances be- the plots, the range of unigram cosine values is
tween misspelled words and target hypotheses large for some items. Thus a lot of misspellings
involve more than just letter transpositions. The
Aspell reported 21 (4%) correctly spelled words as large ranges in bigram cosines and many values
misspelled and suggested a correction (false pos- at 0 for trigrams show that misspellings tend to di-
itives). Overall Aspell’s precision in identifying verge from the target hypotheses to a large extent.
misspellings in our data is thus at 96%.1 For the easier questions (left end of the x sub-
Now, as far as automated correction is con- axes) the ranges of unigram cosine and Leven-
cerned, suggestions were found for not even 60% shtein distance tend to be smaller, while bigram
of the tokens. Out of the 925 tokens for which sug- and trigram cosines are larger and they are also
gestions were found, 321 first suggestions were closer to the low-end of the scale. This means that
wrong, yielding a false negative rate of 64%. With in the easy questions, misspellings tend to contain
321 wrong suggestions and 136 cases for which the right letters, but the letters are misplaced. The
suggestions were not available, about 45% of the same can be seen for the difficult questions (except
non-word misspellings are not accounted for cor- for the last one). The intermediate difficulty items
rectly by Aspell. These results are similar to those tend to have the least letter overlap and many tri-
reported by Rimrott and Heift (2008). gram similarities at the low end of the scale. These
A major issue for Aspell, and, as can be are likely to be most difficult to correct automati-
expected, for any off-the-shelf German spell- cally, but possibly easier to identify as qualifying
checker, are compound nouns. Two of the lis- to be scored at 0.
tening prompts contained compounds as key con-
cepts: “Marxhaus” in the answer to Where are Pe- 5.3 Relation to Scores: Misspelled Key
ter and Birgit? (RA: ‘In front of Marx’ birth place Concepts
in Trier’) and “Energiesparlampen” in the answer As mentioned in Section 3, we used re-
to the previously mentioned LC 3.1. “Marxhaus” sponses to two questions which elicited one
is not in Aspell’s dictionary; the closest sugges- key concept, LC 1.1 and LC 1.6, to inves-
tions it finds as replacements include Matthäus tigate the relation between misspellings and
(Matthew; as in Matthew the Apostle), Parkhaus scores. From the LC 1.1-LC 1.6 corpus sub-
(carpark) or even Hausbar (house bar). Com- set, we extracted responses which contained to-
pounds account for all the 21 valid words which kens with gold standard annotation correspond-
Aspell identified as misspellings. ing to the expected concept: “frankreich” for
1 We
LC 1.1 and “oesterreich” for LC 1.6. There
cannot provide recall results at this point since our
gold standard includes only non-words identified by Aspell. were 236 and 260 such responses, respectively.
We are planning to annotate real-word errors in the future.
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
7
ASJP Dolgopolsky Soundex String
1.00 ● ●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ●
● ●
●
●
●
●
● ●
●
● ●
●
● ●
●
0.75
●
● ● ●
●
● ●
●
●
● ● ● ● ● ●
● ●
●
●
● ● ●
●
●
● ● ● ● ● ●
● ●
● ● ● ●
● ● ●
● ●
● ● ● ●
●
● ● ● ●
● ● ● ●
● ● ●
0.50
● ● ● ● ● ● ●
Cosine
● ●
●
● ● ●
●
● ● ●
● ● ● ●
●
●
● ●
● ●
●
● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●● ● ● ●
● ●
● ●
● ● ● ●
● ● ●
● ●
● ● ● ● ●
●
0.25 ● ● ●
●
● ● ●
●
● ● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ●
● ●
● ● ●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
●
● ● ● ●
●
● ●
● ● ● ●
● ●
●
● ● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
●
● ● ●
●
● ●
0.00
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1.00 ● ●
● ● ● ● ●
● ●
●
●
●
● ●
●
● ●
● ● ● ●
●
● ● ● ● ● ●
●
● ● ●
●
● ● ● ●
● ●
0.75
● ●
●
● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
●
●
● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●
●
● ●
● ● ●
● ● ● ●
●
● ● ●
0.50 nLD
● ●
●
●
● ● ● ● ●● ● ● ● ●● ● ● ●
● ● ●
● ● ●
●
● ●
●
● ●
● ● ●
● ●
● ● ● ●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ●
●
● ● ● ● ● ●
● ● ●
●
0.25
● ● ● ● ● ● ● ●
● ● ●
●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ●
●
● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ● ●
● ● ●
●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ● ● ●
● ● ●● ● ● ●
● ●
● ● ● ● ●
● ●
0.00
● ● ●
●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●
Figure 8: Per score distribution of distances between normalized responses and reference responses
Cosine 1−gram Cosine 2−gram Cosine 3−gram nDL lower similarity to the target concept tokens are
1.00 accepted with partial and full scores in LC 1.1.
Also a larger range of similarity accounts for par-
●
●
● ●
● ● ●
●
● ● ●
●
tial and full points in LC 1.1. This suggests that
what counts as acceptable in terms of misspellings
●
0.75 ●
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
8
Levenshtein distance (linearly) decreasing can be vergence from target forms appears to be item-
seen in the distribution of ASJP and Dolgopol- specific. Finally, we proposed sound class-based
sky normalizations, but less so in the distribu- normalizations as a method of grouping noisy re-
tion of Soundex distances across items. Soundex sponses in terms of their pronunciation similar-
transcriptions do not distinguish well between the ity as well as related distances between normal-
scores based on Levenshtein distance and only ized responses and reference answers to response
somewhat better based on cosine; for most items scores. This served to evaluate prospects for a
there is little difference between mean distances normalization-based approach to response clus-
for scores 0.5 and 1 on the nLD measure and tering. Soundex, the most frequently employed
between mean scores 0.5 and 1. ASJP and normalization, does not distinguish between re-
Dolgopolsky normalizations are more stable in sponses at different score-points, so it can be con-
terms of variance, with ASJP, moreover, display- sidered the worst choice for a normalization-based
ing fewer outliers. This confirms our hypothe- approach. Both of the more elaborate phonetic
sis that the more linguistically-informed encoding transcriptions, based on ASJP’s and Dolgopol-
yields clusters which better correspond to the as- sky’s codes, perform better than Soundex and are
signed scores. It also suggests that these encod- promising directions to pursue. We will exper-
ings might result in better performance on the au- iment with including distances to reference an-
tomated scoring task. We are planning to investi- swers based on both representations as features for
gate this in the course of further work. The ASJP (semi-)automated scoring.
and Dolgopolsky distributions moreover better re-
flect the pattern of string-based distances than the Acknowledgments
Soundex distributions. Finally, ASJP and Dolgo- We thank Dr. Kristin Stezano Cotelo from the
polsky normalizations appear more stable across Saarland University International Office for col-
items on both distance measures and the shape of laboration on placement testing thanks to which
the distributions is similar. It is possibly a combi- this research is possible. We would like to thank
nation of both that would work best as features for Johannes Dellert for letting us use his code for
scoring. sound class-based normalizations. We also thank
the three anonymous reviewers for their helpful
6 Conclusions and Further Work comments.
We presented a study on misspellings in a corpus This work was funded by the Ministry of
of constructed responses to listening comprehen- Science, Research and the Arts of Baden-
sion items used for placement testing for German. Württemberg within the FRESCO project. Mag-
Not surprisingly, our data contains a large num- dalena Wolska is supported by the Institutional
ber of misspellings (around 50% of the unique Strategy of the University of Tübingen (Deutsche
words that learners used). The first-ranked sugges- Forschungsgemeinschaft, ZUK 63).
tions of an off-the-shelf spell-checker were cor-
rect in not even 60% of the cases. This is likely References
to be partially due to the fact that the range of
Kevin Atkinson. 2006. Gnu Aspell 0.60.7. http://
divergence from target forms is substantial. It [Link].
also varies between questions. The majority of
false positives were due to compounds specific Adriane Boyd. 2010. EAGLE: an Error-Annotated Corpus
of Beginning Learner German. In Proceedings of the 7th
to the listening prompts. An obvious solution LREC. [Link]
we are pursuing to improve precision and reduce lrec2010/summaries/812.
false negative suggestion rate is constructing two
Cecil H Brown, Eric W Holman, Søren Wichmann, and
dictionaries: one prompt-specific and the other Viveka Velupillai. 2008. Automated classification of
learner-language specific; the purpose of the lat- the world’s languages: a description of the method
ter is to provide prompt-specific frequent invalid and preliminary results. STUF-Language Typology and
Universals Sprachtypologie und Universalienforschung,
forms produced by the learners. 61(4):285–308. doi:10.1524/stuf.2008.0026.
We have also shown that while in general the
Fred J. Damerau. 1964. A technique for computer detection
expected trend in scoring misspelled responses and correction of spelling errors. Communications of the
can be observed, however, acceptability of di- ACM. doi:10.1145/363958.363994.
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
9
Aharon B. Dolgopolsky. 1986. A probabilistic hypothesis Søren Wichmann, André Mller, Annkathrin Wett, Viveka
concerning the oldest relationships among the language Velupillai, Julia Bischoffberger, Cecil H. Brown, Eric W.
families of northern Eurasia. In Typology, Relationship Holman, Sebastian Sauppe, Zarina Molochieva, Pamela
and Time: A Collection of Papers on Language Change Brown, Harald Hammarström, Oleg Belyaev, Johann-
and Relationship by Soviet Linguists, pages 27–50. (Orig- Mattis List, Dik Bakker, Dmitry Egorov, Matthias Ur-
inal: 1964 In: Voprosy Jazykoznanija 2). ban, Robert Mailhammer, Agustina Carrizo, Matthew S.
Dryer, Evgenia Korovina, David Beck, Helen Geyer, Pat-
Michael Flor and Yoko Futagi. 2012. On using context for tie Epps, Anthony Grant, and Pilar Valenzuela. 2013. The
automatic correction of non-word misspellings in student ASJP-Database (version 16). [Link]
essays. In Proceedings of the 7th Workshop on Building (Retrieved 04/15).
Educational Applications Using NLP. [Link]
org/[Link]?id=2390397. Magdalena Wolska, Andrea Horbach, and Alexis Palmer.
2014. Computer-assisted scoring of short responses: the
efficiency of a clustering-based approach in a real-life
Michael Hahn, Niels Ott, Ramon Ziai, and Detmar Meur-
task. In Advances in Natural Language Processing (Pro-
ers. 2013. CoMeT: Integrating different levels of linguis-
ceedings of the 9th International Conference on Natural
tic modeling for meaning assessment. [Link]
Language Processing (PolTAL-14)). doi:10.1007/978-3-
org/anthology/S/S13/[Link].
319-10888-9 31.
Trude Heift and Anne Rimrott. 2008. Learner responses to
corrective feedback for spelling errors in CALL. System.
doi:10.1016/[Link].2007.09.007.
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015
10