Author Archives: dsturgeon

Digital Approaches to Text Reuse in the Early Chinese Corpus

Published in Journal of Chinese Literature and Culture 2018, 5(2) [Full paper] Observed textual similarities between different pieces of writing are frequently cited by textual scholars as grounds for interpretative stances about the meaning of a passage and its authorship, … Continue reading

Posted in Chinese, Digital Humanities | Comments Off

Accessible Text Mining with Text Tools and the Chinese Text Project

Setup Create a free account on ctext.org and log in. Make sure to validate your e-mail address by opening the link the system sent you (if not, the link above will display a warning/reminder in red to do so). Enter … Continue reading

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

Text Transformation API

Draft – This is a preliminary draft specification. Please note that some implementation details will change before publication. Last updated: 22 March 2019. Overview Transformations of textual data are important processes in many natural language processing and text analysis workflows. … Continue reading

Posted in Digital Humanities | Comments Off

SUTD Workshop

Materials from a workshop held as part of Working with different kinds of ‘text’ in the Digital Humanities at the Singapore University of Technology and Design. Setup Create a free account on ctext.org and log in. Make sure to validate … Continue reading

Posted in Chinese, Digital Humanities | Comments Off

Large-scale Optical Character Recognition of Pre-modern Chinese Texts

This paper appears in International Journal of Buddhist Thought and Culture 28(2) (December 2018). [Full paper] Abstract Optical character recognition (OCR) – the fully automated transcription of text appearing in a digitized image – offers transformative opportunities for the scholarly … Continue reading

Posted in Chinese, Digital Humanities | Comments Off

EASTD 135: Text and Data in the Humanities

This course introduces students to key concepts and techniques fundamental to applying digital methods to the study of textual materials and other types of data in humanities subjects. The core topics covered are digital representations of data, ways of structuring … Continue reading

Posted in Courses, Digital Humanities | Comments Off

Networks of Text Reuse in Early Chinese Literature

Poster presented at Connected Past 2018. Abstract The phenomenon of text reuse – syntactically and semantically similar fragments of text repeated apparently independently in multiple pieces of writing, and often in works purporting to be composed by entirely different authors … Continue reading

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

Accessible digital text analysis for classical Chinese

Paper presented at Future Philologies: Digital Directions in Ancient World Text, Institute for the Study of the Ancient World, New York University, April 20 2018. Abstract Despite a growing interest in digital humanities as a field of study and focus … Continue reading

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

Cyberinfrastructure for historical China studies

It was a pleasure to host on behalf of the Chinese Text Project (ctext.org), together with Professor Peter Bol on behalf of the China Biographical Database (CBDB), the International conference on cyberinfrastructure for historical China studies, held at the Harvard … Continue reading

Posted in Chinese, Digital Humanities, Talks and conference papers | Comments Off

ctext.org入门教程

此教程将会从使用者的角度简单介绍中国哲学书电子化计划资料库和数位图书馆中的主要操作方法,并举具体的操作实例以便示范系统的主要功能。 教程网址: 英文:https://dsturgeon.net/ctext 中文:https://dsturgeon.net/ctext-zhs 日文:https://dsturgeon.net/ctext-ja 1 首次使用前的设置 建立帐户:在左手栏目中,往下卷动并点击“登入”,然后在“若尚未建立本站的帐户”的表格中输入您的资料,再点“建立帐户”。 确认电脑字体是否已安装:在左上角点“本站介绍”,再点“字体试验页”。 书目查询 使用左手栏目中的“书名检索”功能。 检索结果中,“”图标表示该文献的内容可以直接连接到对应的扫描影印资料。 此外,检索结果中可能会看到以下图标: 文字版存放于原典资料库(使用者不能直接编辑)。 可以编辑的文字版,此文字版本是人工输入的而不是OCR结果。 可以编辑的文字版,此文字版本是OCR结果。 版本的影印扫描资料。 习题: 找出《资暇集》的电子全文。 在原典数据库中找出先秦两汉时代的一部经典(如:《庄子》、《荀子》等)。 全文检索 首先找出并打开想要检索的文字版翻译(章节或是卷),点击 左手栏目下部的“检索” 框。 习题: 找出《论语》中带有孔子所说“君子不器”的段落。 找出《庄子》中所有有提到“道”的段落。 当你在文本资料库中检索本文得出多个结果时,可以点击页面右上部的“显示统计”链接,打开检索结果的互动摘要。 在主要的扫描资料中找出文本 在ctext中,可以通过影印底本连结来检索影印资料。当文字版中带有影印连结时,书名检索结果中会显示“”的图标。 当文本跟扫描档案有链接时,点击左方文本中任何一个段落的“”图标,打开对应的扫面版本。 当你要在扫面文本中检索特定的单词或是片语,在文字版中检索对应的单词或是片语,点击左方结果中的“”图标。 文字版中出现的错误(特别在OCR得出的文字版中)表示片语越长越不一致。如遇到这个情况,试图检索短一点的片语或是想要检索的文本附近出现的单词。 习题: 找出有扫描版链接的文本,检索并检视扫面版中的结果。 在OCR得出的文字版中重复一次。 你也可以从“图书馆”中找出扫描文本,这个检索跟你检索文字版链接会有完全一致的结果。 或者,你也可以使用链接来读出每一页的扫描本。 2 找出与文本片语相似的文本 … Continue reading

Posted in Uncategorized | Comments Off