6th International Conference of Digital Archives and Digital Humanities,
30 November 2015, National Taiwan University
New Perspectives on Digital Sinology Resources panel
The digital medium presents unique opportunities and challenges for the development of new kinds of resources for the study of Chinese literature. Using examples from the Chinese Text Project, I suggest ways in which digital libraries can leverage the advantages of the digital realm to offer new functionality and services at relatively low cost. This involves the exploitation of two primary avenues for scalable development: firstly the use of automation to achieve goals realistically attainable by computational methods, and secondly the encouragement of open user engagement to recruit human volunteers to assist with tasks less suited to automation.
Presentation at Harvard University, “Advancing Digital Scholarship in Japanese Studies: Innovations and Challenges” Workshop, 7 November 2015
Belfer Case Study Room, CGIS, 9.00 am
In the ten years since first going online, the Chinese Text Project has gradually expanded from a simple tool for searching and navigating a handful of early Chinese texts to become the largest publicly available full-text database of pre-modern Chinese, containing over 20,000 texts and more than 3 billion characters. In this presentation, I discuss technical and structural changes that have made this expansion possible with only limited resources. These changes involve the exploitation of two primary avenues for scalable development: firstly the use of automation to achieve goals realistically attainable by computational methods, and secondly the encouragement of open user engagement to recruit human volunteers to assist with tasks less suited to automation. Specific examples include the application of optical character recognition to both enable full-text search of scanned early editions as well as create draft transcriptions of the same texts that can be proofread by crowd-sourcing, and of natural language processing techniques to the identification of text reuse and automated compilation of dictionary data. I also introduce ongoing work including the development of Application Programming Interfaces (APIs) and related mechanisms that will allow other projects to integrate with and build upon the resources of this digital library in a decentralized way while at the same time avoiding duplication of effort.
Seminar at Harvard University, Fairbank Center for Chinese Studies, 26 October 2015
Room S153, CGIS South Building, 12.00
Textual parallels among early Chinese transmitted texts are extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving contributions from multiple authors and editors. Partly as a consequence of this complexity, establishing with certainty even approximate dates of authorship for texts and parts of texts is a challenging task. In this presentation, I demonstrate how digital methods grounded in textual and statistical evidence can help us better understand and visualize some of these complex relationships, and how digital methods may offer additional clues as to the likely provenance of disputed texts.
Presentation at Harvard University, Computational Methods for Chinese History: A “Digging into Data Challenge” Training Workshop, 17 October 2015.
Science Center, Room B09, 3.15pm.
From September 2015 to July 2016 I will be serving as a Postdoctoral Fellow at Harvard University’s Fairbank Center of Chinese Studies, working on (among other things) a project I’ve titled “Big Data and Early China: Corpus-Assisted Interpretation of Classical Chinese”. It’s really exciting to be here in Cambridge, and I look forward to being able to concentrate a little more on the digital humanities side of my research over the coming year.
Seminar at Leiden University, 5 August 2015
Since its origins as a database of Warring States philosophical texts, the Chinese Text Project (http://ctext.org) has gradually grown to become one of the largest digital libraries of pre-modern Chinese texts in existence, as well as a platform for applying new digital methods to the study of these texts. This seminar will introduce several unique aspects of the site from both sinological and technical perspectives, as well as discussing ongoing research, development and future goals.
A new version of the Unicode standard has been released, defining thousands of additional rarely used and variant Chinese characters. Support for these has been added to the dictionary section of the site; to view these characters, please install the latest version of the Hanazono font. Many new characters belong to “CJK Extension E” – you can confirm system support for these from the Font Test Page.
This paper, accepted April 2012, has now appeared in Philosophy East and West 65:3 (July 2015).
While visiting the Compilation Center of Korean Confucian Classics at Sungkyunkwan University in Seoul, I had the chance to give a lecture to some graduate students about the Chinese Text Project. Although most of the presentation consisted of a practical introduction using the site itself, there were also a few slides (Chinese) giving a brief overview.