Introduction to EpiDoc Document Structure

xml version || id: documentstructureintro|| example 1 | basic document structure | example 2: elements within the teiheader | the teiheader | example 3: outline of epidoc usage | the text element and its subordinate divs | responsibility for this section | cvs information || post a comment

Example 1

0    <teiHeader> ... </teiHeader>
0    <text> ... </text>
0 </TEI.2>

Basic document structure

The basic document structure recommended by EpiDoc follows exactly that laid out for TEI.2. There are two main sections of the document, both of which are required: the <teiHeader> and the <text>. The <teiHeader> is essentially the title page and descriptive catalog record for the electronic version of the text, providing information about the source of the text, the way it is digitally represented and any revisions it has undergone. The <text> element, on the other hand, embodies the structure and content of the text itself, including introduction, textual edition, translation, critical apparatus, commentary, notes and the like. The basic, top-level structure of any such TEI document would look like that sketched in Example 1.

This simple representation of the top-level tagging hides the structural complexity and semantic power that can be achieved with TEI. Within each of the two major elements, TEI permits (and sometimes requires) the use of a wide range of other elements. Some of these elements have very precise rules governing their use. Others permit adaptation to meet the needs of particular projects or types of textual materials.

Example 2: Elements within the <teiHeader>

0    <teiHeader>
0     <fileDesc> ... </fileDesc>
0     <encodingDesc> ... </encodingDesc>
0     <profileDesc> ... </profileDesc>
0     <revisionDesc> ... </revisionDesc>
0    </teiHeader>
0    <text> ... </text>
0 </TEI.2>

The <teiHeader>

For example, the <teiHeader> (see Example 2) has a very explicit content model, requiring the presence of certain subordinate elements in a particular order. It contains the following:

The use of each of these elements in encoding epigraphic editions (each with its own specific content model) is discussed below. In general, see TEI, Section 5: The TEI Header for more details.

Example 3: Outline of EpiDoc usage

To be completed

The <text> element and its subordinate <div>s

When compared with the <teiHeader>, the top-level <text> element has a more flexible content model, as do many of its subordinate elements. This flexibility is necessary to accomodate the wide range of textual materials that might require digitization. Example 3 provides an abbreviated EpiDoc representation of a boundary marker for an imperial estate in North Africa.

Within the <text> element of Example 3, one finds the mandatory TEI <body> element. TEI provides that this element can be subdivided further into any number of logical divisions using the <div> element. TEI specifies the use of the <div> element for the hierarchical grouping of divisions and subdivisions within a text. Depending on the genre of the text, these 'groups' can have different common names, such as 'chapter' in a novel, 'scene' in a play, or 'book' in an ancient literary work. Rather than elaborate differently named elements for each and every possible such grouping, TEI employs the generic <div> element for all such divisions. The TEI specification for the <div> element provides a standard attribute named type. This attribute is used to categorize these <div> elements independently of their hierarchical relationships using a set of attribute values appropriate to the genre and audience of the text. In this way, the nesting of <div> elements can be used to indicate the hierarchical relationship between a chapter and a subsection, while the type attribute can be used simultaneously to indicate the semantic difference between a chapter and an appendix. Note: Example 3 does not 'nest' its <div> elements, so there are not any significant hierarchical relationships between divisions of this particular text. But there are significant semantic differences, as indicated by the use of the type attribute.

In this example, the type attribute of the <div> element is used to distinguish between six different types of divisions:

a description of the text and/or monument;

the edition of the epigraphic text itself;

a translation of that text;

a scholarly commentary related to the text and its unique problems and items of interest;

a history of the discovery, documentation, and interpretation of the text leading to its present publication; and

a bibliography relevent to this text.

These attribute values (and two more: metadata and diplomatic) have been introduced as part of the EpiDoc standard (see Section 5.5: EpiDoc divisions (tei:div) of the text element (tei:text) for the formal definition of the <div/> element and its attributes). TEI places no limits or requirements on the use of the type attribute; it merely makes type available "to provide a name or description for the division." We have deliberately kept the number of EpiDoc values for type small to ensure the greatest possible degree of smooth interchange between EpiDoc-conformant projects.

EpiDoc exploits the flexibility and specificity of the <div/> element and its type attribute to distinguish between the various components commonly found in the scholarly presentation of epigraphic texts. Compare the conventional print representation of ALA 2 (Example 4) with the XML version in Example 3. Although styles and presentation differ from one scholarly epigraphic publication to the next, at a basic level, this example is representative of the historiographic genre. Typographic cues are generally used to alert the reader to a change in the character of information being presented for each text, but the exact nature of such changes is often simply implied. We assume that our reader understands the conventions of epigraphic publication. In the XML counterpart, the semantic differences between each division are made explicit using standard values for the type attribute.

These semantic distinctions, encoded into a digital text from its creation, facilitate a number of important possibilities when the time comes to prepare the text for presentation to a user or incorporate it into another publication or project. Standard XML software can use combinations of element and attribute values to select, compare, combine and reformat semantically discrete portions of documents in complex ways, providing the semantic distinctions have been explicitly encoded in a standard manner. The typed divisions of TEI, when categorized according to a generally-agreed and epigraphically relevant scheme, constitute such a standard.

Divisions with the same type attribute value can be repeated as necessary, and further semantic distinctions between them drawn at the editor's discretion using the standard TEI attribute n. These arrangements permit extensive, project-specific flexibility in searching, formatting and reusing documents. There are no EpiDoc-mandated values for n, but projects are encouraged to develop standard usage and publish documentation thereof as part of their dataset. The EPAPP project has developed such a system of attribute values, as demonstrated by the text in Example 5. The EPAPP system of values for n on the <div> element has been adopted by these guidelines as a recommendation for projects which find it helpful.

Responsibility for this section

CVS Information

Revision number: $Revision: 1.9 $

Revision name (if any): $Name: r-4-beta-1 $

Revision date: $Date: 2006/04/27 15:47:40 $

Revision committed by: $Author: paregorios $