Digitizing Early China

Seminar at Leiden University, 5 August 2015

Since its origins as a database of Warring States philosophical texts, the Chinese Text Project (http://ctext.org) has gradually grown to become one of the largest digital libraries of pre-modern Chinese texts in existence, as well as a platform for applying new digital methods to the study of these texts. This seminar will introduce several unique aspects of the site from both sinological and technical perspectives, as well as discussing ongoing research, development and future goals.

Posted in Uncategorized | Comments Off

Chinese Text Project – Support for Unicode 8.0

A new version of the Unicode standard has been released, defining thousands of additional rarely used and variant Chinese characters. Support for these has been added to the dictionary section of the site; to view these characters, please install the latest version of the Hanazono font. Many new characters belong to “CJK Extension E” – you can confirm system support for these from the Font Test Page.

Posted in Digital Humanities | Comments Off

Chinese Text Project: over ten million pages of pre-modern Chinese texts now searchable online

Update to the CTP:

A major update to the site has been made by applying OCR to over ten million pages of transmitted texts stored in the Library, linking scanned texts where possible to digital editions that follow them. Over 3000 existing texts have been successfully linked, allowing side-by-side display and textual searching of scanned texts.

Additionally, around ten thousand new texts and editions have also been transcribed for the first time using OCR. While these transcriptions inevitably contain many errors, they make it possible for the first time to search the scanned texts and immediately locate information within them. All newly transcribed texts have been added to the Wiki – please help by correcting errors when using these resources.

For further details, please see the OCR instructions.

Posted in Digital Humanities | Comments Off

Lecture at Sungkyunkwan University

While visiting the Compilation Center of Korean Confucian Classics at Sungkyunkwan University in Seoul, I had the chance to give a lecture to some graduate students about the Chinese Text Project. Although most of the presentation consisted of a practical introduction using the site itself, there were also a few slides (Chinese) giving a brief overview.

Posted in Digital Humanities | Comments Off

International forum on the present and future of archiving projects on Confucian texts

I was very happy to be given the opportunity to introduce some aspects of the Chinese Text Project at this workshop hosted by the Compilation Center of Korean Confucian Classics at Sungkyunkwan University in Seoul. I’ve uploaded a handout (Chinese) for the slides.

Posted in Digital Humanities | Comments Off

Phrase-based alignment of classical Chinese and English

Paper presented at Greek and Latin in an age of Open Data:

Phrase-based alignment of classical Chinese and English
Donald Sturgeon and John S. Y. Lee

Abstract

Aligned parallel corpora are useful for a variety of purposes including machine translation and statistical studies, as well as making possible new and innovative digital tools for use in pedagogy and research. Alignments can be made at various levels of granularity, a common type being alignment of sentences. In the case of classical Chinese in particular, databases containing such alignments are also of direct utility to scholars and linguists due to the complex semantics of individual terms of the language, the limited size of the extant body of writing, and a lack of sufficiently comprehensive bilingual dictionaries. Aligned corpora make possible automated extraction of relevant linguistic data for arbitrary terms, while avoiding the prohibitively high cost involved in manual construction of an adequate bilingual dictionary.

While in many modern languages sentences are delimited in the written form by the presence of certain punctuation marks, classical Chinese was for many centuries written without any punctuation marks whatsoever, and later with punctuation that delimited only boundaries between phrases. Modern editions of classical Chinese texts include punctuation marks corresponding closely to (and greatly influenced by) modern English punctuation, but often disagree on the precise details of such punctuation, highlighting the degree of freedom present in adding such marks. Due to the grammar of classical Chinese, this freedom often extends to choices determining apparent sentence boundaries. Similarly and partly as a result of this, English translations of these texts often differ in the precise delimiting of sentences in the source text.

As a result of these linguistic and historical factors, sentence-based alignment of classical Chinese texts and their modern translations is problematic, as sentences of the source and target languages often fail to correspond exactly due to different choices made in punctuating the text, even where these do not correspond to significant differences in interpretation. By contrast however due to the much lower degree of freedom involved, different modern editions of early texts exhibit much less disagreement regarding the delimiting of phrases.

Motivated by these factors, this study investigates automated phrase-wise alignment of a corpus of classical Chinese texts and their English translations, comparing unsupervised machine-generated phrase-wise alignments versus sentence-wise alignments by means of human annotated results.

Download the full paper.

Posted in Digital Humanities | Comments Off

Chinese Text Project – Dictionary update

Update to the CTP:

The dictionary section of the site has been updated to make better use of English translations. Dictionary pages now cite English translations of example sentences together with the corresponding Chinese examples. Additionally, dictionary look-ups for passages of texts that have English translations now display these translations side by side with the Chinese text for easier comparison. If you prefer the old behaviour, please log in to your CTP account and change the “Dictionary display” setting to “No translations”.

Posted in Digital Humanities | Comments Off

Knowledge and Language in Early Chinese Thought

An invited lecture given at the Taiwan Philosophical Association at Taiwan University.

Abstract

Early Chinese thinkers did not typically characterize knowledge in terms of sentential constructs nor consider these to be a fundamental constituent or theoretical foundation of knowledge. At the same time, relationships between language and knowledge were the subject of intense critical debate, in which thinkers recognized the possibility of such relations existing and the significance should they hold, but were in each case challenged by those skeptical of their generality. This paper will discuss early explorations of the relation between language and knowledge and attempt to explain why some Chinese thinkers came to be skeptical of the role that language might play in understanding and obtaining knowledge.

I shall begin by arguing that in classical Chinese there is an important linguistic disanalogy between knowledge, truth, and belief that would weigh strongly against attempting to account for knowledge in terms of sentential constructs. Instead, knowing was more typically thought of as consisting in objectively correct action and the correct use of words. Secondly, while those with a positive view of the role of language in explaining knowledge attempted to show that there are objective standards that govern the correct use of words, they found it difficult to fully account for the claimed objectivity and uniqueness of these standards. Thirdly, though early thinkers looked for strictly formal regularities in language, in doing so they made the discovery that language does not in fact follow such formal patterns. Finally, Daoists in particular suggested that words can be intentionally used in unconventional ways that force them to take on new and seemingly incompatible interpretations in different contexts, suggesting that words may not be used in accordance with any fixed objective standards at all.

Posted in Philosophy | Comments Off

Knowledge in Early Chinese Thought

The final version of my PhD thesis, titled Knowledge in Early Chinese Thought, is now available online.

Abstract

Early Chinese philosophical texts contain discussions of the nature, origins, and possibility of knowledge, in which both positive accounts and skeptical responses to them are couched in importantly different terms to those most familiar from similar discussions in Western philosophy. In place of appeals to truth, belief, and fallibility of the senses, action, discrimination, and difference of perspective play crucial roles. The aim of this dissertation is to explain why this should be so, and what consequences this had for the early Chinese understanding of knowledge.

In an attempt to answer these questions, I argue that, likely influenced by both facts about the classical Chinese language and key philosophical trends and interests of the time, discussions of knowledge by early Chinese thinkers generally referenced a broad notion of knowledge that was seen as being closely related to action. Linguistic factors also contributed to theorizing about knowledge focusing not on beliefs or other sentential structures, but rather on the drawing of action-guiding shi-fei distinctions, and the same shi-fei framework that was applied to perception was also applied to knowledge.

Language, understood most fundamentally in terms of an ability to distinguish shi-fei and apply names to things in the correct way, also played an important role in the pre‐Qin understanding of knowledge. On a linguistic level, knowledge corresponded to reliably correct language use, and rigid fa (法 standards, models) were seen as underwriting this by providing the standard of correctness. Just as these fa could be used to measure the correctness of individual terms, thinkers interested in the correctness of doctrines and speech in general attempted to apply the same idea to larger linguistic structures such as sentences, in the hope of finding fa for correct language use at a higher level. In doing so, they discovered facts about natural language use that could not be accounted for using the types of fa they considered.

Likely in part influenced by similar observations, others called into question the existence and uniqueness of standards in general and the adequacy of language in expressing knowledge. I argue that the prevailing positive view of knowledge ultimately gave rise to an interesting and nuanced form of skepticism grounded in a form of perspectivism. This skepticism does not merely have the negative consequence that we should question some of our knowledge commitments, but can also be used to suggest that – while still doubting – we can make practical use of our skepticism to improve our knowledge by considering a wider range of perspectives.

Posted in Philosophy | Comments Off

Chinese Text Project – New Wiki section

Update to the CTP:

The Wiki section of the site provides online browsing and full-text search for numerous texts not yet included in the textual database. Since some of these texts have not yet been adequately proofread, users are invited to help in the process of correcting these texts using a Wiki interface, and encouraged to upload historical Chinese texts not yet included. For more details, please see the instructions or browse the Wiki.

Posted in Digital Humanities | Comments Off