Poster presented at Connected Past 2018.
The phenomenon of text reuse – syntactically and semantically similar fragments of text repeated apparently independently in multiple pieces of writing, and often in works purporting to be composed by entirely different authors – is extremely widespread in early Chinese literature. Such reuse is typically unattributed, and its existence is often revealed only through painstaking comparison with other pieces of potentially related writing. Computational methods have for the first time made feasible the comprehensive identification of such reuse throughout large corpora of material, and have thus made practical studies based on patterns of reuse which emerge at much larger scales than had previously been possible to consider.
This work uses network analysis to investigate patterns of text reuse in the early Chinese corpus and the relationship between these patterns and difficult questions of authorship attribution within these texts. Using detailed data on individual instances of text reuse created through an exhaustive automated study of the entire transmitted corpus of Chinese from the earliest transmitted works through to those dating prior to the end of the Han dynasty (220 AD), this study demonstrates the utility of network visualization and analysis in identifying and exploring patterns of text reuse which shed light on the authorship of these early materials.