Languages and scripts

TEI/EpiDoc provides mechanisms for encoding languages and scripts (writing systems) as they relate to the contents of an EpiDoc file and to the text(s) described and transcribed therein. In doing so, we make use of Internet standards for the identification of these languages and scripts. This portion of the Guidelines addresses all relevant aspects.

Relevant element documentation (TEI):

Indicating languages and scripts used in an EpiDoc file

TEI and EpiDoc follow the best current practice outlined in the Network Working Group's RFC 5646: Tags for Identifying Languages, which establishes the norms for same on an Internet-wide basis. The RFC and supporting documents define a syntax for creating short strings of characters (‘language tags’) that function as unique identifiers for any desired combination of language and script. These tags are composed of ‘subtags’ for language qua language, writing sysem (script), and regional and dialectical variation. The RFC also establishes a process for registration and maintenance of these subtags by the Internet Assigned Numbers Authority.

A valid EpiDoc file must make use of subtags recorded in the IANA Language Subtag Registry. Many EpiDoc creators will already be familiar with some of these codes from other digital projects, for example:

  • Grek = Greek script
  • Latn = Latin script
  • en = English language (assumed to be in its standard script: Latn)
  • fr = French language (assumed to be in its standard script: Latn)
  • el = Modern Greek language (1453-; assumed to be in its standard script: Grek)
  • grc = Ancient Greek language (to 1453; assumed to be in its standard script: Grek)
  • grc-Latn = Ancient Greek Language (to 1453), rendered in Latin script

When the IANA registry does not provide appropriate codes, then an EpiDoc project may devise "private use subtags", so long as they are internally defined in the EpiDoc file as outlined in the following paragraph and so long as they conform syntactically to the specifications laid out in RFC 5646, sections 2.1: Syntax and 4.6: Considerations for Private Use Subtags. For example, the Campā Inscriptions team determined that the two Cham language subtags (cja = Western Cham and cjm = Eastern Cham) and the associated script subtag (Cham) were substantively different from the ancient Cham language and script represented in the inscriptions. Therefore the private use subtag "x-oldcam-latn-ci" was invented and given the project-specific meaning "Old Cam language in Old Cam script transliterated in Latin characters." Whenever possible, EpiDoc projects and practitioners should undertake to register new subtags with the IANA for the benefit of others. A procedure for same is set out in RFC 5646 Section 3.5.

Describe langUsage here.

<langUsage>
 <language ident="ar">Arabic</language>
 <language ident="cop">Coptic</language>
 <language ident="egy-Egyd">Egyptian in Demotic script</language>
 <language ident="egy-Egyh">Egyptian in Hieratic script</language>
 <language ident="egy-Egyp">Egyptian Hieroglyphic</language>
 <language ident="etr">Etruscan</language>
 <language ident="el">Modern Greek</language>
 <language ident="grc">Ancient Greek</language>
 <language ident="grc-Latn">Ancient Greek written in latin script</language>
 <language ident="he">Hebrew</language>
 <language ident="la">Latin</language>
 <language ident="la-Grek">Latin written in Greek script</language>
</langUsage>

Character Encodings and Fonts

Indicating the modern language and script used throughout the EpiDoc file

Signaling the languages and scripts used in the texts and translations being presented

<textLang mainLang="grcotherLangs="grc-Latn la"> Inscription in ancient Greek with some words transcribed in Latin characters, and
later annotation in Latin.
</textLang>

Marking transitions in language and/or script in the text

Responsibility for this section

  1. Gabriel Bodard, author
  2. Tom Elliott, author

EpiDoc version: 8.19

Date: 2014-07-31