This course introduces students to key concepts and techniques fundamental to applying digital methods to the study of textual materials and other types of data in humanities subjects. The core topics covered are digital representations of data, ways of structuring and managing data, extracting data from textual materials, and data visualization and analysis. Concepts introduced in lecture sessions will be reinforced and applied concretely in particular contexts during corresponding practical sessions and take-home assignments.
No background in digital methods is assumed, however students are expected to have basic computing skills and access to a suitable laptop. Examples will be selected from a variety of subject domains within the humanities with the primary focus being on textual materials.
Schedule
Week 1 (Jan 28, 30) – Introduction and motivation
- Data and digital techniques in the humanities
- Examples of data-driven approaches in humanities scholarship
Week 2 (Feb 4, 6) – Representation I
- Fundamentals of digital representation of information
- Basic types of data and their digital representations
Week 3 (Feb 11, 13) – Data and ontologies I
- Ontologies and metadata
- Reading: Paul Vierthaler. Analyzing Printing Trends in Late Imperial China Using Large Bibliometric Datasets, Harvard Journal of Asiatic Studies.
- Reading: Matthew L. Jockers. Macroanalysis: Digital Methods and Literary History, University of Illinois Press 2013. Pages 35-62.
Week 4 (Feb 20) – Representation II
- Research data management
Week 5 (Feb 25, 27) – Data and ontologies II
- Databases and structured data
Week 6 (Mar 4, 6) – Data and ontologies III
- Linked Open Data in the humanities
Week 7 (Mar 11, 13) – From text to data I
- Simple models of textual materials
- Reading: Regexone, interactive tutorial, lessons 1-14
- Reading: Jean-Baptiste Michel et al, Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014).
Week 8 (Mar 18, 20) – Visualization I
- Charts and diagrams
Week 9 (Mar 25, 27) – Visualization II
- Graphs, maps, and trees
Week 10 (Apr 1, 3) – From text to data II
- Topic modeling
- Reading: Teddy Roland, Topic modeling: what humanists actually do with it.
Week 11 (Apr 8, 10) – From text to data III
- Part of speech tagging and parsing of natural languages
Week 12 (Apr 15, 17) – From text to data IV
- Markup and annotation systems
Week 13 (Apr 22, 24) – Review
- Review and discussion of project work
Week 14 (Apr 29, May 1) – Project presentations
- Student projects presented in class