Archive for July, 2017

OEDUc: Disambiguating EDH person RDF working group

Tuesday, July 25th, 2017

One of the working groups at the Open Epigraphic Data Unconference (OEDUc) meeting in London (May 15, 2017) focussed on disambiguating EDH person RDF. Since the Epigraphic Database Heidelberg (EDH) has made all of its data available to download in various formats in an Open Data Repository, it is possible to extract the person data from the EDH Linked Data RDF.

A first step in enriching this prosopographic data might be to link the EDH person names with PIR and Trismegistos (TM) references. At this moment the EDH person RDF only contains links to attestations of persons, rather than unique individuals (although it attaches only one REF entry to persons who have multiple occurrences in the same text), so we cannot use the EDH person URI to disambiguate persons from different texts.

Given that EDH already contains links to PIR in its bibliography, we could start with extracting (this should be possible using a simple Python script) and linking these to the EDH person REF. In the case where there is only one person attested in a text, the PIR reference can be linked directly to the RDF of that EDH person attestation. If, however (and probably in most cases), there are multiple person references in a text, we should try another procedure (possibly by looking at the first letter of the EDH name and matching it to the alphabetical PIR volume).

A second way of enriching the EDH person RDF could be done by using the Trismegistos People portal. At the moment this database of persons and attestations of persons in texts consists mostly of names from papyri (from Ptolemaic Egypt), but TM is in the process of adding all names from inscriptions (using an automated NER script on the textual data from EDCS via the EAGLE project). Once this is completed, it will be possible to use the stable TM PER ID (for persons) and TM person REF ID (for attestations of persons) identifiers (and URIs) to link up with EDH.

The recommended procedure to follow would be similar to the one of PIR. Whenever there’s a one-to-one relationship with a single EDH person reference the TM person REF ID could be directly linked to it. In case of multiple attestations of different names in an inscription, we could modify the TM REF dataset by first removing all double attestations, and secondly matching the remaining ones to the EDH RDF by making use of the order of appearance (in EDH the person that occurs first in an inscription receives a URI (?) that consists of the EDH text ID and an integer representing the place of the name in the text (e.g., is the first appearing person name in text HD000001). Finally, we could check for mistakes by matching the first character(s) of the EDH name with the first character(s) of the TM REF name. Ultimately, by using the links from the TM REF IDs with the TM PER IDs we could send back to EDH which REF names are to be considered the same person and thus further disambiguating their person RDF data.

This process would be a good step in enhancing the SNAP:DRGN-compliant RDF produced by EDH, which was also addressed in another working group: recommendations for EDH person-records in SNAP RDF.

OEDUc: EDH and Pelagios location disambiguation Working Group

Wednesday, July 5th, 2017

From the beginning of the un-conference, an interest in linked open geodata seemed to be shared by a number of participants. Moreover, an attention towards gazetteers and alignment appeared among the desiderata for the event, expressed by the EDH developers. So, in the second part of the unconference, we had a look at what sort of geographic information can be found in the EDH and what could be added.

The discussion, of course, involved Pelagios and Pleiades and their different but related roles in establishing links between sources of geographical information. EDH is already one of the contributors of the Pelagios LOD ecosystem. Using the Pleiades IDs to identify places, it was relatively easy for the EDH to make its database compatible with Pelagios and discoverable on Peripleo, Pelagios’s search and visualisation engine.

However, looking into the data available for downloads, we focused on a couple things. One is that each of the epigraphic texts in the EDH has, of course, a unique identifier (EDH text IDs). The other is that each of the places mentioned has, also, a unique identifier (EDH geo IDs), besides the Pleiades ID. As one can imagine, the relationships between texts and places can be one to one, or one to many (as a place can be related to more than one text and a text can be related to more than one place). All places mentioned in the EDH database have an EDH geo ID, and the information becomes especially relevant in the case of those places that do not have already an ID in Pleiades or GeoNames. In this perspective, EDH geo IDs fill the gaps left by the other two gazetteer and meet the specific needs of the EDH.

Exploring Peripleo to see what information from the EDH can be found in it and how it gets visualised, we noticed that only the information about the texts appear as resources (identified by the diamond icon), while the EDH geo IDs do not show as a gazetteer-like reference, as it happen for other databases, such as Trismegistos or Vici.

So we decided to do a little work on the EDH geo IDs, more specifically:

  1. To extract them and treat them as a small, internal gazetteer that could be contributed to Pelagios. Such feature wouldn’t represent a substantial change in the way EDH is used, or how the data are found in Peripleo, but we thought it could  improve the visibility of the EDH in the Pelagios panorama, and, possibly, act as an intermediate step for the matching of different gazetteers that focus in the ancient world.
  2. The idea of using the EDH geo IDs as bridges sounded interesting especially when thinking of the possible interaction with the Trismegistos database, so we wondered if a closer collaboration between the two projects couldn’t benefit them both. Trismegistos, in fact, is another project with substantial geographic information: about 50.000 place-names mapped against Pleiades, Wikipedia and GeoNames. Since the last Linked Past conference, they have tried to align their place-names with Pelagios, but the operation was successful only for 10,000 of them. We believe that enhancing the links between Trismegistos and EDH could make them better connected to each other and both more effectively present in the LOD ecosystem around the ancient world.

With these two objectives in mind, we downloaded the geoJSON dump from the EDH website and extracted the texts IDs, the geo IDs, and their relationships. Once the lists (that can be found on the git hub repository) had been created, it becomes relatively straightforward to try and match the EDH geoIDs with the Trismegistos GeoIDs. In this way, through the intermediate step of the geographical relationships between text IDs and geo IDs in EDH, Trismegistos also gains a better and more informative connection with the EDH texts.

This first, quick attempt at aligning geodata using their common references, might help testing how good the automatic matches are, and start thinking of how to troubleshoot mismatches and other errors. This closer look at geographical information also brought up a small bug in the EDH interface: in the internal EDH search, when there is a connection to a place that does not have a Pleiades ID, the website treats it as an error, instead of, for example, referring to the internal EDH geoIDs. Maybe something that is worth flagging to the EDH developers and that, in a way, underlines another benefit of treating the EDH geo IDs as a small gazetteer of its own.

In the end, we used the common IDs (either in Pleiades or GeoNames) to do a first alignment between the Trismegistos and EDH places IDs. We didn’t have time to check the accuracy (but you are welcome to take this experiment one step further!) but we fully expect to get quite a few positive results. And we have a the list of EDH geoIDs ready to be re-used for other purposes and maybe to make its debut on the Peripleo scene.

OEDUc: recommendations for EDH person-records in SNAP RDF

Monday, July 3rd, 2017

At the first meeting of the Open Epigraphic Data Unconference (OEDUc) in London in May 2017, one of the working groups that met in the afternoon (and claim to have completed our brief, so do not propose to meet again) examined the person-data offered for download on the EDH open data repository, and made some recommendations for making this data more compatible with the SNAP:DRGN guidelines.

Currently, the RDF of a person-record in the EDH data (in TTL format) looks like:

    a lawd:Person ;
    lawd:PersonalName "Nonia Optata"@lat ;
    gndo:gender <> ;
    nmo:hasStartDate "0071" ;
    nmo:hasEndDate "0130" ;
    snap:associatedPlace <> ,
        <> ;
    lawd:hasAttestation <> .

We identified a few problems with this data structure, and made recommendations as follows.

  1. We propose that EDH split the current person references in edh_people.ttl into: (a) one lawd:Person, which has the properties for name, gender, status, membership, and hasAttestation, and (b) one lawd:PersonAttestation, which has properties dct:Source (which points to the URI for the inscription itself) and lawd:Citation. Date and location etc. can then be derived from the inscription (which is where they belong).
  2. A few observations:
    1. Lawd:PersonalName is a class, not a property. The recommended property for a personal name as a string is foaf:name
    2. the language tag for Latin should be @la (not lat)
    3. there are currently thousands of empty strings tagged as Greek
    4. Nomisma date properties cannot be used on person, because the definition is inappropriate (and unclear)
    5. As documented, Nomisma date properties refer only to numismatic dates, not epigraphic (I would request a modification to their documentation for this)
    6. the D-N.B ontology for gender is inadequate (which is partly why SNAP has avoided tagging gender so far); a better ontology may be found, but I would suggest plain text values for now
    7. to the person record, above, we could then add dct:identifier with the PIR number (and compare discussion of plans for disambiguation of PIR persons in another working group)