Modelling written language production in English: a Bayesian model of spelling

Mason, Helen (2019) Modelling written language production in English: a Bayesian model of spelling. PhD thesis, James Cook University.

PDF (Thesis)
Download (4MB) | Preview
View at Publisher Website:


In English, word production (i.e., spelling) is more complex than word recognition (i.e., reading) as the relationship between graphemes and phonemes is not equally distributed. Despite this known complexity, models of spelling are largely descriptive and less sophisticated than models of reading. Furthermore, current models of spelling do not adequately explain how spelling information is learned and managed. Intriguingly, Bayesian reading models have facilitated research and development in recent years. Given the complex relationship between reading and spelling, I developed a Bayesian computational model of spelling to address these limitations.

Operational validity was assessed both theoretically (i.e., conceptual and predictive validity) and empirically (i.e., data, event, and predictive validity). The model was designed to behave like a human speller based on what is currently known about spelling. The Bayesian spelling model simulates a dictation task and makes spelling decisions based on 10 parameters that are analogous to human spelling decisions. The model was trained with Queensland spelling lists based on National curriculum guidelines for grades 1 to 7, which provides data validity, and was tested with words from a computerised NAPLAN dictation task from grades 3, 5, 7, and 9 students, providing event validity.

Predictive validity was examined by comparing the Bayesian spelling model responses with the NAPLAN student data. Accuracy and error data for students and for all parameters were calculated and transformed into density distributions to overcome sample size limitations, and skewed data in relation to letter frequencies. Independent-samples Bayesian t-tests compared the distributions of the model with the distributions of students for each testing grade.

Results for grades 3 and 5 supported my hypotheses, showing positive evidence that there was no difference between the distributions of data from students and the distributions from expected model parameters. Although results for grades 7 and 9 supported my hypotheses for error only, accuracy data were still in alignment with the predictions of current spelling models and with Australian curriculum guidelines.

My findings validate the model as spelling behaviour is effectively reproduced (i.e., empirical validation) and data are congruent with existing literature (i.e., theoretical validation). Furthermore, the progression of learning through parameter decisions aligns with known learning processes. These findings provide robust evidence that Bayesian decision making can be used to model spelling behaviour and that my model can reproduce the learning process of spelling.

My model provides ample research opportunities, including investigation of early phonological learning and later morphological strategies. I suggest that further model development consider the ability to examine contractions and differentiate between homophones. Future research with the Bayesian spelling model feasibly provides a means of experimentally examining educational strategies and spelling disorders and could have implications for natural language processing. Most importantly, it is hoped that future research with the Bayesian model of spelling will highlight the important role of spelling and spelling education in everyday life.

Item ID: 63743
Item Type: Thesis (PhD)
Keywords: language, words, word production, word recognition, spelling, reading, graphemes, phonemes, Bayesian model
Copyright Information: Copyright © 2019 Helen Mason.
Research Data:
Date Deposited: 10 Jul 2020 05:28
FoR Codes: 17 PSYCHOLOGY AND COGNITIVE SCIENCES > 1702 Cognitive Science > 170204 Linguistic Processes (incl Speech Production and Comprehension) @ 50%
17 PSYCHOLOGY AND COGNITIVE SCIENCES > 1702 Cognitive Science > 170205 Neurocognitive Patterns and Neural Networks @ 50%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970117 Expanding Knowledge in Psychology and Cognitive Sciences @ 100%
Downloads: Total: 144
Last 12 Months: 19
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page