SPARQL querying for ctext.org data

The Chinese Text Project includes a Data Wiki, which creates and organizes machine-readable data about premodern entities such as people, written works, bureacratic offices, places, etc.

While this data can be searched from within the user interface itself, a more powerful and flexible method of searching is to query the data using the W3C standard query language SPARQL. This interface can be accessed at: https://sparql.ctext.org/; it is also possible to download the same data and run queries locally on your own computer using any graph database that supports RDF and SPARQL.

Referring to entities in RDF

There are two ways of querying data in the ctext RDF graph. The first, simpler approach uses only properties, entities, and literal values.

Prefix Equivalent Wikidata prefix Semantics
ctext: wd: Refers to an entity
claim: wdt: Refers to a property of a subject

For example, the RDF serialization of the data for 司馬光 (ctext:506404) contains the claim that the father of 司馬光 was 司馬池 (ctext:439713). This is expressed in the RDF data as:

ctext:506404 claim:father ctext:439713 .

Note that properties when used in this way (i.e. to connect a subject directly to an object) use the “claim:” prefix.
We can use this approach to query statements in SPARQL, by adding vairables for those pieces of data we want returned. For instance, we could ask for a list of all of the works that ctext knows about created by 司馬光 with the query:

SELECT * WHERE {
  ?work claim:creator ctext:506404 .
}

For convenience, the ctext RDF graph uses “rdfs:label” to assign exactly one label to each entity (i.e. the “default” name in the Data Wiki). So to show the names of the works in question, we can add an additional line to the SPARQL query:

SELECT * WHERE {
  ?work claim:creator ctext:506404 .
  ?work rdfs:label ?worktitle .
}

The same approach can be used for all other properties; for example, listing the titles that 司馬光 has held can be done in exactly the same way:

SELECT * WHERE {
  ctext:506404 claim:held-office ?office .
  ?office rdfs:label ?officetitle .
}

A limitation of this simple approach to querying is that it does not allow access to qualifiers (e.g. the “from-date” qualifier, used to state from what date a person held a particular title). The second, more flexible approach to querying allows this information to be included, by using an intermediary node between the subject and object, and connecting this intermediary node to any qualifiers present. Note that this representation of the data uses a different prefix for properties than the first approach.

Prefix Equivalent Wikidata prefix Semantics
cstat: p: Refers to a statement of a subject
cprop: ps: Refers to a property of a statement
cqual: pq: Refers to a qualifier of a statement

In this representation, the RDF serialization for 司馬光 (ctext:506404)‘s father being 司馬池 is now expressed using two edges and a blank node:

ctext:506404 cstat:father [
  cprop:father ctext:439713
] .

The advantage of this representation is that it is possible to record (and therefore query) qualifiers as well as properties. For instance, the claim that 司馬光 held the title of 資政殿學士 (ctext:179992) from the date 元豐七年十二月戊辰 (date:562206.7.12.5) is expressed as follows:

ctext:506404 cstat:held-office [
  cprop:held-office ctext:179992 ;
  cqual:from-date date:562206.7.12.5
] .

This means we can query this information also. To start with, we can reproduce the simple query to show titles held using this alternative representation:

SELECT * WHERE {
  ctext:506404 cstat:held-office ?statement .
  ?statement cprop:held-office ?office .
  ?office rdfs:label ?officetitle .
}

Now we can additionally ask for the from-date qualifier value:

SELECT * WHERE {
  ctext:506404 cstat:held-office ?statement .
  ?statement cprop:held-office ?office .
  ?office rdfs:label ?officetitle .
  ?statement cqual:from-date ?fromdate .
  ?fromdate rdfs:label ?fromdatedesc .
}

Dates in the ctext RDF graph are themselves nodes containing additional information (please refer to the RDF itself for detailed examples). We can additionally have our query output the Julian/Gregorian year/month/date of the dates in question, by requesting the “time:hasBeginning” (and, if we want to be precise, also the “time:hasDuration”) edges:

SELECT * WHERE {
  ctext:506404 cstat:held-office ?statement .
  ?statement cprop:held-office ?office .
  ?office rdfs:label ?officetitle .
  ?statement cqual:from-date ?fromdate .
  ?fromdate rdfs:label ?fromdatedesc .
  ?fromdate time:hasBeginning ?fromdateymd .
}
This entry was posted in Chinese, Digital Humanities. Bookmark the permalink.

Comments are closed.