Posts Tagged ‘snap:drgn’

OEDUc: Disambiguating EDH person RDF working group

Tuesday, July 25th, 2017

One of the working groups at the Open Epigraphic Data Unconference (OEDUc) meeting in London (May 15, 2017) focussed on disambiguating EDH person RDF. Since the Epigraphic Database Heidelberg (EDH) has made all of its data available to download in various formats in an Open Data Repository, it is possible to extract the person data from the EDH Linked Data RDF.

A first step in enriching this prosopographic data might be to link the EDH person names with PIR and Trismegistos (TM) references. At this moment the EDH person RDF only contains links to attestations of persons, rather than unique individuals (although it attaches only one REF entry to persons who have multiple occurrences in the same text), so we cannot use the EDH person URI to disambiguate persons from different texts.

Given that EDH already contains links to PIR in its bibliography, we could start with extracting (this should be possible using a simple Python script) and linking these to the EDH person REF. In the case where there is only one person attested in a text, the PIR reference can be linked directly to the RDF of that EDH person attestation. If, however (and probably in most cases), there are multiple person references in a text, we should try another procedure (possibly by looking at the first letter of the EDH name and matching it to the alphabetical PIR volume).

A second way of enriching the EDH person RDF could be done by using the Trismegistos People portal. At the moment this database of persons and attestations of persons in texts consists mostly of names from papyri (from Ptolemaic Egypt), but TM is in the process of adding all names from inscriptions (using an automated NER script on the textual data from EDCS via the EAGLE project). Once this is completed, it will be possible to use the stable TM PER ID (for persons) and TM person REF ID (for attestations of persons) identifiers (and URIs) to link up with EDH.

The recommended procedure to follow would be similar to the one of PIR. Whenever there’s a one-to-one relationship with a single EDH person reference the TM person REF ID could be directly linked to it. In case of multiple attestations of different names in an inscription, we could modify the TM REF dataset by first removing all double attestations, and secondly matching the remaining ones to the EDH RDF by making use of the order of appearance (in EDH the person that occurs first in an inscription receives a URI (?) that consists of the EDH text ID and an integer representing the place of the name in the text (e.g., http://edh-www.adw.uni-heidelberg.de/edh/person/HD000001/1 is the first appearing person name in text HD000001). Finally, we could check for mistakes by matching the first character(s) of the EDH name with the first character(s) of the TM REF name. Ultimately, by using the links from the TM REF IDs with the TM PER IDs we could send back to EDH which REF names are to be considered the same person and thus further disambiguating their person RDF data.

This process would be a good step in enhancing the SNAP:DRGN-compliant RDF produced by EDH, which was also addressed in another working group: recommendations for EDH person-records in SNAP RDF.

OEDUc: recommendations for EDH person-records in SNAP RDF

Monday, July 3rd, 2017

At the first meeting of the Open Epigraphic Data Unconference (OEDUc) in London in May 2017, one of the working groups that met in the afternoon (and claim to have completed our brief, so do not propose to meet again) examined the person-data offered for download on the EDH open data repository, and made some recommendations for making this data more compatible with the SNAP:DRGN guidelines.

Currently, the RDF of a person-record in the EDH data (in TTL format) looks like:

<http://edh-www.adw.uni-heidelberg.de/edh/person/HD000001/1>
    a lawd:Person ;
    lawd:PersonalName "Nonia Optata"@lat ;
    gndo:gender <http://d-nb.info/standards/vocab/gnd/gender#female> ;
    nmo:hasStartDate "0071" ;
    nmo:hasEndDate "0130" ;
    snap:associatedPlace <http://edh-www.adw.uni-heidelberg.de/edh/geographie/11843> ,
        <http://pleiades.stoa.org/places/432808#this> ;
    lawd:hasAttestation <http://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001> .

We identified a few problems with this data structure, and made recommendations as follows.

  1. We propose that EDH split the current person references in edh_people.ttl into: (a) one lawd:Person, which has the properties for name, gender, status, membership, and hasAttestation, and (b) one lawd:PersonAttestation, which has properties dct:Source (which points to the URI for the inscription itself) and lawd:Citation. Date and location etc. can then be derived from the inscription (which is where they belong).
  2. A few observations:
    1. Lawd:PersonalName is a class, not a property. The recommended property for a personal name as a string is foaf:name
    2. the language tag for Latin should be @la (not lat)
    3. there are currently thousands of empty strings tagged as Greek
    4. Nomisma date properties cannot be used on person, because the definition is inappropriate (and unclear)
    5. As documented, Nomisma date properties refer only to numismatic dates, not epigraphic (I would request a modification to their documentation for this)
    6. the D-N.B ontology for gender is inadequate (which is partly why SNAP has avoided tagging gender so far); a better ontology may be found, but I would suggest plain text values for now
    7. to the person record, above, we could then add dct:identifier with the PIR number (and compare discussion of plans for disambiguation of PIR persons in another working group)

Open Epigraphic Data Unconference report

Wednesday, June 7th, 2017

Last month, a dozen or so scholars met in London (and were joined by a similar number via remote video-conference) to discuss and work on the open data produced by the Epigraphic Database Heidelberg. (See call and description.)

Over the course of the day seven working groups were formed, two of which completed their briefs within the day, but the other five will lead to ongoing work and discussion. Fuller reports from the individual groups will follow here shortly, but here is a short summary of the activities, along with links to the pages in the Wiki of the OEDUc Github repository.

Useful links:

  1. All interested colleagues are welcome to join the discussion group: https://groups.google.com/forum/#!forum/oeduc
  2. Code, documentation, and other notes are collected in the Github repository: https://github.com/EpiDoc/OEDUc

1. Disambiguating EDH person RDF
(Gabriel Bodard, Núria García Casacuberta, Tom Gheldof, Rada Varga)
We discussed and broadly specced out a couple of steps in the process for disambiguating PIR references for inscriptions in EDH that contain multiple personal names, for linking together person references that cite the same PIR entry, and for using Trismegistos data to further disambiguate EDH persons. We haven’t written any actual code to implement this yet, but we expect a few Python scripts would do the trick.

2. Epigraphic ontology
(Hugh Cayless, Paula Granados, Tim Hill, Thomas Kollatz, Franco Luciani, Emilia Mataix, Orla Murphy, Charlotte Tupman, Valeria Vitale, Franziska Weise)
This group discussed the various ontologies available for encoding epigraphic information (LAWDI, Nomisma, EAGLE Vocabularies) and ideas for filling the gaps between this. This is a long-standing desideratum of the EpiDoc community, and will be an ongoing discussion (perhaps the most important of the workshop).

3. Images and image metadata
(Angie Lumezeanu, Sarah Middle, Simona Stoyanova)
This group attempted to write scripts to track down copyright information on images in EDH (too complicated, but EAGLE may have more of this), download images and metadata (scripts in Github), and explored the possibility of embedding metadata in the images in IPTC format (in progress).

4. EDH and SNAP:DRGN mapping
(Rada Varga, Scott Vanderbilt, Gabriel Bodard, Tim Hill, Hugh Cayless, Elli Mylonas, Franziska Weise, Frank Grieshaber)
In this group we revised the status of SNAP:DRGN recommendations for person-data in RDF, and then looked in detail about the person list exported from the EDH data. A list of suggestions for improving this data was produced for EDH to consider. This task was considered to be complete. (Although Frank may have feedback or questions for us later.)

5. EDH and Pelagios NER
(Orla Murphy, Sarah Middle, Simona Stoyanova, Núria Garcia Casacuberta, Thomas Kollatz)
This group explored the possibility of running machine named entity extraction on the Latin texts of the EDH inscriptions, in two stages: extracting plain text from the XML (code in Github); applying CLTK/NLTK scripts to identify entities (in progress).

6. EDH and Pelagios location disambiguation
(Paula Granados, Valeria Vitale, Franco Luciani, Angie Lumezeanu, Thomas Kollatz, Hugh Cayless, Tim Hill)
This group aimed to work on disambiguating location information in the EDH data export, for example making links between Geonames place identifiers, TMGeo places, Wikidata and Pleiades identifiers, via the Pelagios gazetteer or other linking mechanisms. A pathway for resolving was identified, but work is still ongoing.

7. Exist-db mashup application
(Pietro Liuzzo)
This task, which Dr Liuzzo carried out alone, since his network connection didn’t allow him to join any of the discussion groups on the day, was to create an implementation of existing code for displaying and editing epigraphic editions (using Exist-db, Leiden+, etc.) and offer a demonstration interface by which the EDH data could be served up to the public and contributions and improvements invited. (A preview “epigraphy.info” perhaps?)

Summer School in Digital Humanities (Sep 2016, Hissar, Bulgaria)

Thursday, March 3rd, 2016

The Centre for Excellence in the Humanities to the University of Sofia, Bulgaria, organizes jointly with an international team of lecturers and researchers in the field of Digital Humanities a Summer School in Digital Humanities. The Summer School will take place between 05-10 September 2016 and is targeted at historians, archaeologists, classical scholars, philologists, museum and conservation workers, linguists, researchers in translation and reception studies, specialists in cultural heritage and cultural management, textual critics and other humanitarians with little to moderate skills in IT who would like to enhance their competences. The Summer School will provide four introductory modules on the following topics:

  • Text encoding and interchange by Gabriel Bodard, University of London, and Simona Stoyanova, King’s College London: TEI, EpiDoc XML (http://epidoc.sourceforge.net/), marking up of epigraphic monuments, authority lists, linked open data for toponymy and prosopography: SNAP:DRGN (http://snapdrgn.net/), Pelagios (http://pelagios-project.blogspot.bg/), Pleiades (http://pleiades.stoa.org/).
  • Text and image annotation and alignment by Simona Stoyanova, King’s College London, and Polina Yordanova, University of Sofia: SoSOL Perseids tools (http://perseids.org), Arethusa grammatical annotation and treebanking of texts, Alpheios text and translation alignment, text/image alignment tools.
  • Geographical Information Systems and Neogeography by Maria Baramova, University of Sofia, and Valeria Vitale, King’s College London: Historical GIS, interactive map layers with historical information, using GeoNames (http://www.geonames.org/) and geospatial data, Recogito tool for Pelagios.
  • 3D Imaging and Modelling for Cultural Heritage by Valeria Vitale, King’s College London: photogrammetry, digital modelling of indoor and outdoor objects of cultural heritage, Meshmixer (http://www.meshmixer.com/), Sketchup (http://www.sketchup.com/) and others.

The school is open for applications by MA and PhD students and postdoc and early researchers from all humanitarian disciplines, as well as employees in the field of cultural heritage. The applicants should send a CV and a Motivation statement clarifying their specific needs and expressing interest in one or more of the modules no later than 15.05.2016. The places are limited and you will be notified about your acceptance within 10 working days after the application deadline. Transfer from Sofia to Hissar and back, accommodation and meal expenses during the Summer School are covered by the organizers. Five scholarships of 250 euro will be accorded by the organizing committee to the participants whose work and motivation are deemed the most relevant and important.

The participation fee is 40 еurо. It covers coffee breaks, social programme and materials for the participants.

Please submit your applications to dimitar.illiev@gmail.com.

ORGANISING COMMITTEE
Assoc. Prof. Dimitar Birov (Department of Informatics, University of Sofia)
Dr. Maria Baramova (Department of Balkan History, University of Sofia)
Dr. Dimitar Iliev (Department of Classics, University of Sofia)
Mirela Hadjieva (Centre for Excellence in the Humanities, University of Sofia)
Dobromir Dobrev (Centre for Excellence in the Humanities, University of Sofia)
Kristina Ferdinandova (Centre for Excellence in the Humanities, University of Sofia)

SNAP:DRGN introduction

Thursday, May 8th, 2014

Standards for Networking Ancient Prosopography: Data and Relations in Greco-roman Names (SNAP:DRGN) is a one-year pilot project, based at King’s College London in collaboration with colleagues from the Lexicon of Greek Personal Names (Oxford), Trismegistos (Leuven), Papyri.info (Duke) and Pelagios (Southampton), and hopes to include many more data partners by the end of this first year. Much of the early discussion of this project took place at the LAWDI school in 2013. Our goal is to recommend standards for sharing relatively minimalist data about classical and other ancient prosopographical and onomastic datasets in RDF, thereby creating a huge graph of person-data that scholars can:

  1. query to find individuals, patterns, relationships, statistics and other information;
  2. follow back to the richer and fuller source information in the contributing database;
  3. contribute new datasets or individual persons, names and textual references/attestations;
  4. annotate to declare identity between persons (or co-reference groups) in different source datasets;
  5. annotate to express other relationships between persons/entities in different or the same source dataset (such as familial relationships, legal encounters, etc.)
  6. use URIs to annotate texts and other references to names with the identity of the person to whom they refer (similar to Pelagios’s model for places using Pleiades).

More detailed description (plus successful funding bid document, if you’re really keen) can be found at <http://snapdrgn.net/about>.

Our April workshop invited a handful of representative data-holders and experts in prosopography and/or linked open data to spend two days in London discussing the SNAP:DRGN project, their own data and work, and approaches to sharing and linking prosopographical data in general. We presented a first draft of the SNAP:DRGN “Cookbook”, the guidelines for formatting a subset of prosopographical data in RDF for contribution to the SNAP graph, and received some extremely useful feedback on individual technical issues and the overall approach. A summary of the workshop, and slides from many of the presentations, can be found at <http://snapdrgn.net/archives/110>.

In the coming weeks we shall announce the first public version of the SNAP ontology, the Cookbook, and the graph of our core and partner datasets and annotations. For further discussion about the project, and linked data for prosopography in general, you can also join the Ancient-People Googlegroup (where I posted a summary similar to this post earlier today).