Specifying available languages and scripts

xml version || id: teilangusage|| responsibility for this section | cvs information || post a comment

Release or development version: r-5

Implementation of the guidance in this section is required for leiden conformance.

The TEI provides a <langUsage> element in the <teiHeader> whereby the languages and scripts used in a given document or project must be declared before they may be used in the document itself. The <langUsage> contains a list of language-and-script codes, drawn from international standards (see: Language tags in HTML and XML; the standards document itself is RFC 4646: Tags for Identifying Languages (September 2006)). Each such code may be glossed with an appropriate description (at the discretion of the project or editor). Examples follow:

The id of each <language> should contain the 2-letter ISO language code, where one is available, or the 3-digit code where one is not (as in the case of ancient Greek). The four-letter, case-sensitive script code should not be added as a subtag unless it is not the usual (default) script for the language in question. These languages are then available to be referred to by the lang or xml:lang attribute on any element in the file. See <tei:foreign> and languages.xml.

Note that established practice of looking up language codes in various ISO-sponsored lists is now deprecated. These lists function as the source for only some of the full set; The Internet Assigned Numbers Authority (IANA) is the final aggregator and arbiter for language codes. IANA's lists of language/script codes and related information are available at:

Note that, by definition, language subtags are case-neutral and all processing environments are required to treat them in a case-insensitive manner. There is, however, a recommended captialization scheme — intended to help human readers — which is reflected in the IANA registry. Note, however, that the TEI P4 DTD defines the standard lang attribute as an IDREF; this means that — in order for the file to be valid — there must be one, and only one, exact copy of each language code value in an id attribute somewhere in the file (i.e., on a <language> element in the header, as described above). EpiDoc users therefore must ensure that they make consistent use of capitalization in language tags across their project, and preferably across the EpiDoc world.

One possible safeguard might be to produce a modification of the EpiDoc DTD to constrain the legal values for the lang attribute so that they match the codes used in the id attributes of the <language> tags in the header. So, for example, the default EpiDoc DTD defines lang as follows:

To match the example language code declarations above, this definition could be altered as follows:

Responsibility for this section

CVS Information

Revision number: $Revision: 1.6 $

Revision name (if any): $Name: r-5 $

Revision date: $Date: 2006/12/06 14:41:14 $

Revision committed by: $Author: gabrielbodard $