Chinese Text Project: A Digital Library of Pre-Modern Chinese Literature

Paper presented at Digital Humanities Congress 2016, University of Sheffield

Since its creation in 2005 as an online search tool for a handful of classical Chinese texts, the Chinese Text Project has gradually grown to become the largest and most widely used digital library of pre-modern Chinese texts, as well as a platform for exploring the application of new digital methods to the study of pre-modern Chinese literature. This paper discusses how several unique aspects of the project have contributed to its success. Firstly it demonstrates how simplifying assumptions holding for domain-specific OCR (Optical Character Recognition) of historical works have made possible reductions in complexity of the task and thus led to increased recognition accuracy. Secondly it shows how crowd-sourced proofreading and editing using a publicly accessible version-controlled wiki system has made it possible to leverage a large and distributed audience and user base, including many volunteers located outside of traditional academia, to improve the quality of digital content and enable the creation of accurate transcriptions of previously untranscribed texts and editions. Finally, it explores how the implementation of open APIs (Application Programming Interfaces) has greatly expanded the utility of the library as a whole, facilitating open and decentralized integration with other projects, as well as leading to entirely new applications in digital humanities teaching and research.

