Stylometric analysis and machine learning methods

In this lecture we will present various key developments in the Authorship Attribution (AA) methodology. More specifically we will examine a very successful and widely used stylometric feature in AAI studies, i.e. the n-gram. We will investigate n-grams in character and word levels and explore their quantitative properties. We will also discuss the various methods of tokenization existed for these kinds of units and some reference to previous studies. In addition we will present an overview of the most effective machine learning algorithms used in AA (Support Vector Machines and Random Forests). This will be a non-technical presentation and the focus will be on the concepts underlying the specific algorithms.



16.5.2019 15:50
16.5.2019 15:50
Místo konání
nám. Jana Palacha 2, Praha 1 (místnost č. 18)
Ústav Českého národního korpusu FF UK
Typ události
Konference a přednášky