Textual Relationships in the Pre-Qin and Han Corpus: A Digital Approach

Seminar at Harvard University, Fairbank Center for Chinese Studies, 26 October 2015
Room S153, CGIS South Building, 12.00

Textual parallels among early Chinese transmitted texts are extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving contributions from multiple authors and editors. Partly as a consequence of this complexity, establishing with certainty even approximate dates of authorship for texts and parts of texts is a challenging task. In this presentation, I demonstrate how digital methods grounded in textual and statistical evidence can help us better understand and visualize some of these complex relationships, and how digital methods may offer additional clues as to the likely provenance of disputed texts.

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

Exploring Text Reuse in the Pre-Qin and Han Corpus

Presentation at Harvard University, Computational Methods for Chinese History: A “Digging into Data Challenge” Training Workshop, 17 October 2015.
Science Center, Room B09, 3.15pm.

Posted in Chinese, Digital Humanities, Talks and conference papers, Video | Comments Off

Fairbank Center

From September 2015 to July 2016 I will be serving as a Postdoctoral Fellow at Harvard University’s Fairbank Center of Chinese Studies, working on (among other things) a project I’ve titled “Big Data and Early China: Corpus-Assisted Interpretation of Classical Chinese”. It’s really exciting to be here in Cambridge, and I look forward to being able to concentrate a little more on the digital humanities side of my research over the coming year.

Posted in Chinese, Digital Humanities, Philosophy | Comments Off

Digitizing Early China

Seminar at Leiden University, 5 August 2015

Since its origins as a database of Warring States philosophical texts, the Chinese Text Project (http://ctext.org) has gradually grown to become one of the largest digital libraries of pre-modern Chinese texts in existence, as well as a platform for applying new digital methods to the study of these texts. This seminar will introduce several unique aspects of the site from both sinological and technical perspectives, as well as discussing ongoing research, development and future goals.

Posted in Chinese, Digital Humanities, Philosophy, Talks and conference papers | Comments Off

Chinese Text Project – Support for Unicode 8.0

A new version of the Unicode standard has been released, defining thousands of additional rarely used and variant Chinese characters. Support for these has been added to the dictionary section of the site; to view these characters, please install the latest version of the Hanazono font. Many new characters belong to “CJK Extension E” – you can confirm system support for these from the Font Test Page.

Posted in Chinese, Digital Humanities | Comments Off

Zhuangzi, perspectives, and greater knowledge

This paper, accepted April 2012, has now appeared in Philosophy East and West 65:3 (July 2015).

Posted in Chinese, Philosophy | Comments Off

Chinese Text Project: over ten million pages of pre-modern Chinese texts now searchable online

Update to the CTP:

A major update to the site has been made by applying OCR to over ten million pages of transmitted texts stored in the Library, linking scanned texts where possible to digital editions that follow them. Over 3000 existing texts have been successfully linked, allowing side-by-side display and textual searching of scanned texts.

Additionally, around ten thousand new texts and editions have also been transcribed for the first time using OCR. While these transcriptions inevitably contain many errors, they make it possible for the first time to search the scanned texts and immediately locate information within them. All newly transcribed texts have been added to the Wiki – please help by correcting errors when using these resources.

For further details, please see the OCR instructions.

Posted in Chinese, Digital Humanities | Comments Off

Text Tools embedding test

This is a test of embedding a scripted Text Tools instance inside an IFrame on another site. The same script can be run in a new window via this link.


This figure displays a heat-map visualization of similarity based on n-gram shingling in the Mozi with n=5. Clicking on a cell displays the list of all similarities corresponding to the pair of textual units being compared in that cell; clicking on a highlighted segment of text displays all passages in the corpus containing the selected n-gram. Click the “Similarity matrix” link to return to the heat-map view.

Code used to create the above as a link:

As an IFrame:

Posted in Uncategorized | Comments Off

Lecture at Sungkyunkwan University

While visiting the Compilation Center of Korean Confucian Classics at Sungkyunkwan University in Seoul, I had the chance to give a lecture to some graduate students about the Chinese Text Project. Although most of the presentation consisted of a practical introduction using the site itself, there were also a few slides (Chinese) giving a brief overview.

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

International forum on the present and future of archiving projects on Confucian texts

I was very happy to be given the opportunity to introduce some aspects of the Chinese Text Project at this workshop hosted by the Compilation Center of Korean Confucian Classics at Sungkyunkwan University in Seoul. I’ve uploaded a handout (Chinese) for the slides.

Posted in Chinese, Digital Humanities | Comments Off