Text Transformation API

Draft – This is a preliminary draft specification. Please note that some implementation details will change before publication. Last updated: 22 March 2019.

Overview

Transformations of textual data are important processes in many natural language processing and text analysis workflows. Examples include tokenization, lemmatization, and appending of part of speech tags, as well as many other (often language-specific) procedures. In this specification, a text transformation is any operation which takes as input a sequence of Unicode characters, and produces as output a sequence of Unicode characters. The Text Transformation API (TTA) defines a simple specification for how to negotiate, request, and deliver text transformations over HTTP.

A TTA server is a system which both: 1) publishes a TTA service manifest, and 2) provides or references at least one TTA transformation service endpoint.

Service manifest

A service manifest is a valid JSON file containing a list of transformation services. Each service is described using the following key-value pairs:

Key Value
endpoint The URL of the transformation service endpoint described by this entry.
languages A list of ISO 639-1 language codes to which the endpoint is relevant or recommended.
title A human readable description of the service the endpoint describes.

Transformation service endpoint

A transformation endpoint is a HTTP or HTTPS URL which accepts a string of text sent to it via the HTTP POST method using the “application/x-www-form-urlencoded” content type. The content of the string must be supplied in the “data” parameter of the request in UTF-8 encoding.

The response to any valid request must be a JSON file containing exactly one of the following key value pairs:

Key Value
output The contents of the “data” parameter transformed according to the service provided by the requested endpoint.
error A string explaining why the request failed.

Transformation client

A transformation client is any software which 1) requests TTA service manifests, specified by their URL; 2) provides a user with a means of viewing the “title” descriptions of the endpoints from any conformant TTA manifest, and 3) provides a user with a means of transforming texts using any conformant endpoint.

Examples

A non-normative example of a TTA service manifest (containing references to example TTA service endpoints) is: https://txt.ctext.org/services.pl

A non-normative example of a TTA client is accessible here.

This entry was posted in Digital Humanities. Bookmark the permalink.

Comments are closed.