OEDUc: EDH and Pelagios location disambiguation Working Group

July 5th, 2017 by Valeria Vitale

From the beginning of the un-conference, an interest in linked open geodata seemed to be shared by a number of participants. Moreover, an attention towards gazetteers and alignment appeared among the desiderata for the event, expressed by the EDH developers. So, in the second part of the unconference, we had a look at what sort of geographic information can be found in the EDH and what could be added.

The discussion, of course, involved Pelagios and Pleiades and their different but related roles in establishing links between sources of geographical information. EDH is already one of the contributors of the Pelagios LOD ecosystem. Using the Pleiades IDs to identify places, it was relatively easy for the EDH to make its database compatible with Pelagios and discoverable on Peripleo, Pelagios’s search and visualisation engine.

However, looking into the data available for downloads, we focused on a couple things. One is that each of the epigraphic texts in the EDH has, of course, a unique identifier (EDH text IDs). The other is that each of the places mentioned has, also, a unique identifier (EDH geo IDs), besides the Pleiades ID. As one can imagine, the relationships between texts and places can be one to one, or one to many (as a place can be related to more than one text and a text can be related to more than one place). All places mentioned in the EDH database have an EDH geo ID, and the information becomes especially relevant in the case of those places that do not have already an ID in Pleiades or GeoNames. In this perspective, EDH geo IDs fill the gaps left by the other two gazetteer and meet the specific needs of the EDH.

Exploring Peripleo to see what information from the EDH can be found in it and how it gets visualised, we noticed that only the information about the texts appear as resources (identified by the diamond icon), while the EDH geo IDs do not show as a gazetteer-like reference, as it happen for other databases, such as Trismegistos or Vici.

So we decided to do a little work on the EDH geo IDs, more specifically:

  1. To extract them and treat them as a small, internal gazetteer that could be contributed to Pelagios. Such feature wouldn’t represent a substantial change in the way EDH is used, or how the data are found in Peripleo, but we thought it could  improve the visibility of the EDH in the Pelagios panorama, and, possibly, act as an intermediate step for the matching of different gazetteers that focus in the ancient world.
  2. The idea of using the EDH geo IDs as bridges sounded interesting especially when thinking of the possible interaction with the Trismegistos database, so we wondered if a closer collaboration between the two projects couldn’t benefit them both. Trismegistos, in fact, is another project with substantial geographic information: about 50.000 place-names mapped against Pleiades, Wikipedia and GeoNames. Since the last Linked Past conference, they have tried to align their place-names with Pelagios, but the operation was successful only for 10,000 of them. We believe that enhancing the links between Trismegistos and EDH could make them better connected to each other and both more effectively present in the LOD ecosystem around the ancient world.

With these two objectives in mind, we downloaded the geoJSON dump from the EDH website and extracted the texts IDs, the geo IDs, and their relationships. Once the lists (that can be found on the git hub repository) had been created, it becomes relatively straightforward to try and match the EDH geoIDs with the Trismegistos GeoIDs. In this way, through the intermediate step of the geographical relationships between text IDs and geo IDs in EDH, Trismegistos also gains a better and more informative connection with the EDH texts.

This first, quick attempt at aligning geodata using their common references, might help testing how good the automatic matches are, and start thinking of how to troubleshoot mismatches and other errors. This closer look at geographical information also brought up a small bug in the EDH interface: in the internal EDH search, when there is a connection to a place that does not have a Pleiades ID, the website treats it as an error, instead of, for example, referring to the internal EDH geoIDs. Maybe something that is worth flagging to the EDH developers and that, in a way, underlines another benefit of treating the EDH geo IDs as a small gazetteer of its own.

In the end, we used the common IDs (either in Pleiades or GeoNames) to do a first alignment between the Trismegistos and EDH places IDs. We didn’t have time to check the accuracy (but you are welcome to take this experiment one step further!) but we fully expect to get quite a few positive results. And we have a the list of EDH geoIDs ready to be re-used for other purposes and maybe to make its debut on the Peripleo scene.

Leave a Reply