Accessible Text Mining with Text Tools and the Chinese Text Project

Create a free account on ctext.org and log in.
Make sure to validate your e-mail address by opening the link the system sent you (if not, the link above will display a warning/reminder in red to do so).
Enter the API key “aas2019” (without quotes) in the box labeled “API key”, and click “Save”.
[Optional] Install the “Text Tools” plugin into your ctext account.

Some parts of the “Practical introduction to ctext.org” and “Text Tools for ctext.org” will be demonstrated – please refer to the tutorials for step-by-step instructions.

In addition to the examples shown in the tutorials:

Try comparing the aggregate vocabulary of two texts (e.g. the 墨子 and 呂氏春秋) using the “Vectors” tab. Click “Toggle values” to display the heatmap, and try inspecting some of the comparisons.
Try the “Run PCA” link with these or other texts.
Try creating vectors that model only a specifically selected subset of vocabulary use. To do this, start by entering multiple search terms in the Regex tool (one per line) – one example would be grammatical particles such as 而, 也, 以, 乎, 之, 矣, 亦. From the “Summary” tab, click “Create vectors”, and then from the output choose “Run PCA”.
Using the regex tool with “Group rows by” set to “None” and “Extract groups” checked, try extracting data about biographies in the 宋史. You may want to start by using a small part of the text, e.g. 列傳第二十一 (ctp:ws281485). Example regex: (\w+)，字(\w+)，(\w+)人。
A few additional examples and instructions for using materials not written in classical Chinese are available on the SUTD workshop page.

Recent Posts