Paper to be presented at Biographical Data in a Digital World, 6 November 2017, Linz.
In modeling complex humanities data, projects working within a particular domain often have overlapping but distinct priorities and goals. One common result of this is that separate systems contain overlapping data: some of the objects modeled are common to more than one system, though how they are represented may be very different in each.
While within a particular domain it can be desirable for projects to standardize their data structures and formats in order to allow for more efficient linking and exchange of data between projects, for complex datasets this can be an ambitious task in itself. An alternative approach is to identify a core set of data which it would be most beneficial to be able to query in aggregate across systems, and provide mechanisms for sharing and maintaining this data as a means through which to link between projects.
For biographical data, the clearest example of this is information about the same individual appearing in multiple systems. Focusing on this particular case, this talk presents one approach to creating and sustaining with minimal maintenance a means for establishing machine-actionable links between datasets maintained and developed by different groups, while also promoting more ambitious data sharing.
This model consists of three components: 1) schema maintainers, who define and publish a format for sharing data; 2) data providers, who make data available according to a published schema; and 3) client systems, which aggregate the data from one or more data providers adhering to a common schema. This can be used to implement a sustainable union catalog of the data, in which the catalog provides a means to directly locate information in any of the connected systems, but is not itself responsible for maintenance of data. The model is designed to be general-purpose and to extend naturally to similar use cases.