Lehrinhalte
The course deals with the comparison of texts and corpora. The focus is on lexical items (typically words or lexemes), whose occurrence and distribution can be used to investigate similarities and differences between texts or groups of texts. For this purpose, three widely used methods of digital text analysis will be used, which deal with frequencies of lexical items in different ways: term frequency-inverse document frequency (tf-idf), stylometric analysis, and topic modeling.

Following a theoretical introduction to the course, participants will be given the opportunity to learn the basics of simple computational analysis. Small groups will then create corpora that will be analyzed later in the semester.

Voraussetzungen
Programming skills are not required, but a [b]willingness to learn Python[/b] using the materials provided and to [b]apply[/b] the analyses provided to their own texts is.

Participants will need to bring a computer to the course sessions.

 

Further Grading Information
The final course assignment is to create an [b]executable Jupyter notebook[/b] that applies one of the methods to a self-compiled text corpus, and to [b]reflect[/b] on one's work with it.

Semester: ST 2022