ctext Data Wiki tutorial

Getting started

The Data Wiki is a crowdsourced graph database that contains information about entities (people, places, offices, written works, etc.) that are mentioned in premodern Chinese texts. Each “page” of the datawiki lists information about one entity – for example, 王安石 Wang Anshi, 樞密使 Shumishi, or the 明史 History of the Ming.

Every entity in the Data Wiki has its own unique identifier – this always consists of a string beginning with “ctext:”, followed by some sequence of numbers. This is important, because it allows the system to distinguish between things that can be referred to by the same name, and treat equivalently references to the same thing when different names are used for it. For example, the name “大順” could refer to either an era of the Tang dynasty, another possible name for the 天順 era of the Vietnam’s Lý dynasty, or the short-lived dynasty of 李自成.

Exercise:

The data wiki always displays the identifier for an entity at the top of the page, just below its title. You can search for entities by typing in the name an entity is referred to in the “Data Search” box at the left-hand side of the screen.

  • What is the identifier for office of 樞密使 Shumishi?
  • What is the identifier for the Ming dynasty person 楊俊 who died in 1457?

Properties and qualifiers

Apart from the identifier, all other data in the Data Wiki consists of “claims” about entities. Each claim connects:

  1. An entity (the subject of the claim)
  2. A property: an entity, which must first itself be defined (list of current properties)
  3. An object: either an entity, or a literal (usually a string, or a date)

Each row of an entity record (i.e. the table displayed when you look up an entity) represents one “claim” about that entity.

For example, the claim that the father of 諸葛亮 Zhuge Liang was 諸葛珪 Zhuge Gui is represented by the a claim using the “father” property, connecting:

  1. Subject: 諸葛亮 Zhuge Liang
  2. Property: father
  3. Object: 諸葛珪 Zhuge Gui

This corresponds to a machine-readable triple connecting three entities: ctext:82307 ctext:539391 ctext:167600 . This is closely related (though not identical) to the RDF representation of the same claim.

Each claim may also have one or more qualifiers (list of current qualifiers), each of which is again paired with a corresponding object. This allows additional contextual information to be added to a specific claim. For example, we might have a claim that 王珪 held the office of 參知政事:

  1. Subject: 王珪
  2. Property: held-office
  3. Object: 參知政事

If we also know from what date he held the office, we can qualify the claim with the qualifier “from-date” with the date from which he held the office as its object:

  1. Qualifier: from-data
  2. Object: 熙寧三年十二月丁卯

You can see how this claim – together with this qualifier to it – is displayed in the entity record for 王珪.

Exercise:
  • Write a query to list all people who have held an office with a title “…總督”
Exercise:
  • Write a query to list all written works that are indexed in the 清史稿
  • Modify the query to list instead:
    1. all works that are indexed in both 清史稿 and 四庫全書總目提要
    2. all works that are indexed in the 清史稿 but not in 四庫全書總目提要
    3. all works that are indexed in the 清史稿 and mentioned in the text of the 四庫全書總目提要 – does this give the same result as query (a)?

Working with dates

The Data Wiki (and other components of ctext.org) understand Chinese dates and can convert them to and from Gregorian/Julian calendar dates.

Exercise:
  • Try searching the Data Wiki for an historical date, such as “大順元年” or “天成四年五月四日”. Try clicking through the two alternative choices given in the “Resolved date”, “Era/ruler”, and “Associated rulers” columns
  • Experiment with other date references

Although dates in this system are not strictly entities, they have analogous identifiers that begin “date:…”. These distinguish explicitly between different eras that happen to share the same name, and record the content of dates in different historically used formats in ways that make them machine processable. For example, in the exercise above, “天成四年五月四日” is ambiguous – even if we exclude the possibility of one of the three candidate eras on the grounds that it had no fourth year. The two possiblities have different identifiers: date:794498/4/5/+4 (if we mean the Later Tang/Min 天成), and date:587624/4/5/+4 (if we mean the Lý dynasty 天成).

It’s important to note that “the same day” can be referenced in different ways, and these have different identifiers. For example, we could specify these same dates equivalently (but with different semantics) using 干支 for the year, day, or both:

All four representations have different identifiers that directly mirror their semantics, but will be treatedly as equivalent dates when searching texts because they resolve to the same day in history.

Exercise:

By looking up the data for 安祿山:

  1. According to the record in the 新唐書, in what month did 安祿山 rebel against the Tang dynasty?
  2. Using this information, locate the passages in the 新唐書 and 舊唐書 that explicitly reference a) this month, and b) this year.
Creative Commons License
This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.