Prof. R. L. Górski: Historical linguistics and stylometry. Can the corpus tell us how to periodize the history of a language?

How do we know when, say, Early Modern period of a given language expires and Late Modern commences? Typically coarse-grained periodizations are based on changes of the grammatical system, whereas fine-grained ones take as an evidence some sociolinguistic or philological arguments. Instead we propose a corpus driven approach. Using text categorisation methods, in a stepwise fashion we divide a diachronic corpus into two, as different as possible, subcorpora (Eder & Górski 2016). This allows us for identification of quantitatively different stages in language development. The underlying assumption is that effective categorisation is possible only if two requirements are satisfied: there is a true difference (be it lexical or grammatical) between older and newer texts and the two subcorpora are homogeneous.

Event detail

Event start
12. 10. 2016 17:30 - 18:30
Faculty of Arts, Jan Palach Square 2, Prague 1 (room 104)
Organizing Institution
Department of Czech National Corpus
Event type