CHNSHIS 202: Digital Methods for Chinese Studies

I currently (Spring 2016 and 2017; Fall 2017) teach the course CHNSHIS 202: Digital Methods for Chinese Studies at Harvard’s EALC. Below is the syllabus from the 2016 course.

Course Description

This course introduces graduate students in Chinese studies to programming skills and digital humanities techniques of direct practical relevance to research in their discipline. It will consist of weekly lectures, each introducing a specific type of technique, followed by an interactive lab session during which students practice applying the technique to data appropriate to their own research. No background in digital methods or programming is assumed, but students are expected to have basic computing skills and are required to bring a suitable laptop to use during the lab sessions. The techniques covered in this course all have broad applicability to topics in Chinese studies, and students will be expected to apply them to their own research topics and relevant texts as arranged during the first few sessions. The course will end with student presentations in which students apply an appropriate selection of the techniques studied to their own research questions.While examples and coursework will draw upon Chinese language source materials, students primarily working with other East Asian languages are also encouraged to take this course.


Week 1 – Introduction

  • Background and basic concepts
  • Representing text on a computer
  • Setting up the Python environment

Week 2 – Introduction to programming

  • Variables, functions, loops, and files

Week 3 – Regular expressions

  • String manipulation and data extraction.

Week 4 – Working with structured data

  • Associative arrays, tables, CSV files

Week 5 – Practical data manipulation

  • Automated extraction of data from the web

Week 6 – Textual similarity

  • Introduction to information retrieval

Week 7 – Topic modeling

  • Generating and interpreting data using Mallet

Week 8 – Network visualization with Gephi

  • Representing data as a network graph

Week 9 – Principal component analysis

  • Exploratory data analysis in Python

Week 10 – Machine learning

  • Features, classification, regression

Week 11 – Review and discussion

  • What worked, what didn’t, and why
  • Debugging of issues arising during project work

Week 12 – Student presentations and discussion

Coursework and Assessment

  • Class participation (30%)
    Students are expected to attend and actively participate in the practical sessions, completing short assigned problem sets, and applying techniques introduced to their own data.
  • Homework assignments (30%)
    Four short homework assignments will be set based upon the application of digital techniques covered.
  • Final presentations (40%)
    Each student will give one presentation in which techniques introduced during the course are applied to a research topic in Chinese studies.

Learning Outcomes

Having completed this course, students will:

  • Have an understanding of how to apply digital techniques to their own projects.
  • Be able to apply basic programming techniques to extract data from Chinese texts for analysis, and perform various kinds of digital analysis on the resultant data in the context of their research.
  • Possess the basic skills needed to make use of the growing number of open-source Python libraries relevant to textual analysis.  
This entry was posted in Chinese, Courses, Digital Humanities. Bookmark the permalink.

Comments are closed.