Posts Tagged ‘linked open data’

OEDUc: Disambiguating EDH person RDF working group

Tuesday, July 25th, 2017

One of the working groups at the Open Epigraphic Data Unconference (OEDUc) meeting in London (May 15, 2017) focussed on disambiguating EDH person RDF. Since the Epigraphic Database Heidelberg (EDH) has made all of its data available to download in various formats in an Open Data Repository, it is possible to extract the person data from the EDH Linked Data RDF.

A first step in enriching this prosopographic data might be to link the EDH person names with PIR and Trismegistos (TM) references. At this moment the EDH person RDF only contains links to attestations of persons, rather than unique individuals (although it attaches only one REF entry to persons who have multiple occurrences in the same text), so we cannot use the EDH person URI to disambiguate persons from different texts.

Given that EDH already contains links to PIR in its bibliography, we could start with extracting (this should be possible using a simple Python script) and linking these to the EDH person REF. In the case where there is only one person attested in a text, the PIR reference can be linked directly to the RDF of that EDH person attestation. If, however (and probably in most cases), there are multiple person references in a text, we should try another procedure (possibly by looking at the first letter of the EDH name and matching it to the alphabetical PIR volume).

A second way of enriching the EDH person RDF could be done by using the Trismegistos People portal. At the moment this database of persons and attestations of persons in texts consists mostly of names from papyri (from Ptolemaic Egypt), but TM is in the process of adding all names from inscriptions (using an automated NER script on the textual data from EDCS via the EAGLE project). Once this is completed, it will be possible to use the stable TM PER ID (for persons) and TM person REF ID (for attestations of persons) identifiers (and URIs) to link up with EDH.

The recommended procedure to follow would be similar to the one of PIR. Whenever there’s a one-to-one relationship with a single EDH person reference the TM person REF ID could be directly linked to it. In case of multiple attestations of different names in an inscription, we could modify the TM REF dataset by first removing all double attestations, and secondly matching the remaining ones to the EDH RDF by making use of the order of appearance (in EDH the person that occurs first in an inscription receives a URI (?) that consists of the EDH text ID and an integer representing the place of the name in the text (e.g., http://edh-www.adw.uni-heidelberg.de/edh/person/HD000001/1 is the first appearing person name in text HD000001). Finally, we could check for mistakes by matching the first character(s) of the EDH name with the first character(s) of the TM REF name. Ultimately, by using the links from the TM REF IDs with the TM PER IDs we could send back to EDH which REF names are to be considered the same person and thus further disambiguating their person RDF data.

This process would be a good step in enhancing the SNAP:DRGN-compliant RDF produced by EDH, which was also addressed in another working group: recommendations for EDH person-records in SNAP RDF.

Reflecting on our (first ever) Digital Classicist Wiki Sprint

Wednesday, July 16th, 2014

From (Print) Encyclopedia to (Digital) Wiki

According to Denis Diderot and Jean le Rond d’Alembert the purpose of an encyclopedia in the 18th century was ‘to collect knowledge disseminated around the globe; to set forth its general system to the people with whom we live, and transmit it to those who will come after us, so that the work of preceding centuries will not become useless to the centuries to come’.  Encyclopedias have existed for around 2,000 years; the oldest is in fact a classical text, Naturalis Historia, written ca 77 CE by Pliny the Elder.

Following the (recent) digitalization of raw data, new, digital forms of encyclopedia have emerged. In our very own, digital era, a Wiki is a wider, electronic encyclopedia that is open to contributions and edits by interesting parties. It contains concept analyses, images, media, and so on, and it is freely available, thus making the creation, recording, and dissemination of knowledge a democratised process, open to everyone who wishes to contribute.

 

A Sprint for Digital Classicists

For us, Digital Classicists, scholars and students interested in the application of humanities computing to research in the ancient and Byzantine worlds, the Digital Classicist Wiki is composed and edited by a hub for scholars and students. This wiki collects guidelines and suggestions of major technical issues, and catalogues digital projects and tools of relevance to classicists. The wiki also lists events, bibliographies and publications (print and electronic), and other developments in the field. A discussion group serves as grist for a list of FAQs. As members of the community provide answers and other suggestions, some of these may evolve into independent wiki articles providing work-in-progress guidelines and reports. The scope of the Wiki follows the interests and expertise of collaborators, in general, and of the editors, in particular. The Digital Classicist is hosted by the Department of Digital Humanities at King’s College London, and the Stoa Consortium, University of Kentucky.

So how did we end up editing this massive piece of work? On Tuesday July 1, 2014 and around 16:00 GMT (or 17:00 CET) a group of interested parties gathered up in several digital platforms. The idea was that most of the action will take place in the DigiClass chatroom on IRC, our very own channel called #digiclass. Alongside the traditional chat window, there was also a Skype voice call to get us started and discuss approaches before editing. On the side, we had a GoogleDoc where people simultaneously added what they thought should be improved or created. I was very excited to interact with old members and new. It was a fun break during my mini trip to the Netherlands, and as it proved, very focused on the general attitude of the Digital Classicists team; knowledge is open to everyone who wishes to learn and can be the outcome of a joyful collaborative process.

 

The Technology Factor

As a researcher of digital history, and I suppose most information system scholars would agree, technology is never neutral in the process of ‘making’. The magic of the Wiki consists on the fact that it is a rather simple platform that can be easily tweaked. All users were invited to edit any page to create new pages within the wiki Web site, using only a regular web browser without any extra add-ons. Wiki makes page link creation easy by showing whether an intended target page exists or not. A wiki enables communities to write documents collaboratively, using a simple markup language and a web browser. A single page in a wiki website is referred to as a wiki page, while the entire collection of pages, which are usually well interconnected by hyperlinks, is ‘the wiki’. A wiki is essentially a database for creating, browsing, and searching through information. A wiki allows non-linear, evolving, complex and networked text, argument and interaction. Edits can be made in real time and appear almost instantly online. This can facilitate abuse of the system. Private wiki servers (such as the Digital Classicist one) require user identification to edit pages, thus making the process somewhat mildly controlled. Most importantly, as researchers of the digital we understood in practice that a wiki is not a carefully crafted site for casual visitors. Instead, it seeks to involve the visitor in an ongoing process of creation and collaboration that constantly changes the Web site landscape.

 

Where Technology Shapes the Future of Humanities

In terms of Human resources some with little involvement in the Digital Classicist community before this, got themselves involved in several tasks including correcting pages, suggesting new projects, adding pages to the wiki, helping others with information and background, approaching project-owners and leaders in order to suggest adding or improving information. Collaboration, a practice usually reserved for science scholars, made the process easier and intellectually stimulating.  Moreover, within these overt cyber-spaces of ubiquitous interaction one could identify a strong sense of productive diversity within our own scholarly community; it was visible both in the IRC chat channel as well as over skype. Several different accents and spellings, British, American English, and several continental scholars were gathering up to expand this incredibly fast-pacing process. There was a need to address research projects, categories, and tools found in non-english speaking academic cultures.  As a consequence of this multivocal procedure, more interesting questions arose, not lest methodological. ‘What projects are defined as digital, really’, ‘Isn’t everything a database?’ ‘What is a prototype?’. ‘Shouldn’t there be a special category for dissertations, or visualisations?’.  The beauty of collaboration in all its glory, plus expanding our horizons with technology! And so much fun!

MediaWiki recorded almost 250 changes made in the 1st of July 2014!

The best news, however is that this, first ever wiki sprint was not the last.  In the words of the Organisers, Gabriel Boddard and Simon Mahony,

‘We have recently started a programme of short intensive work-sprints to
improve the content of the Digital Classicist Wiki
(http://wiki.digitalclassicist.org/). A small group of us this week made
about 250 edits in a couple of hours in the afternoon, and added dozens
of new projects, tools, and other information pages.

We would like to invite other members of the Digital Classicist community to
join us for future “sprints” of this kind, which will be held on the
first Tuesday of every month, at 16h00 London time (usually =17:00
Central Europe; =11:00 Eastern US).

To take part in a sprint:

1. Join us in the DigiClass chatroom (instructions at
<http://wiki.digitalclassicist.org/DigiClass_IRC_Channel>) during the
scheduled slot, and we’ll decide what to do there;

2. You will need an account on the Wiki–if you don’t already have one,
please email one of the admins to be invited;

3. You do not need to have taken part before, or to come along every
month; occasional contributors are most welcome!’

The next few sprints are scheduled for:
* August 5th
* September 2nd
* October 7th
* November 4th
* December 2nd

Please, do join us, whenever you can!

 

 

SNAP:DRGN introduction

Thursday, May 8th, 2014

Standards for Networking Ancient Prosopography: Data and Relations in Greco-roman Names (SNAP:DRGN) is a one-year pilot project, based at King’s College London in collaboration with colleagues from the Lexicon of Greek Personal Names (Oxford), Trismegistos (Leuven), Papyri.info (Duke) and Pelagios (Southampton), and hopes to include many more data partners by the end of this first year. Much of the early discussion of this project took place at the LAWDI school in 2013. Our goal is to recommend standards for sharing relatively minimalist data about classical and other ancient prosopographical and onomastic datasets in RDF, thereby creating a huge graph of person-data that scholars can:

  1. query to find individuals, patterns, relationships, statistics and other information;
  2. follow back to the richer and fuller source information in the contributing database;
  3. contribute new datasets or individual persons, names and textual references/attestations;
  4. annotate to declare identity between persons (or co-reference groups) in different source datasets;
  5. annotate to express other relationships between persons/entities in different or the same source dataset (such as familial relationships, legal encounters, etc.)
  6. use URIs to annotate texts and other references to names with the identity of the person to whom they refer (similar to Pelagios’s model for places using Pleiades).

More detailed description (plus successful funding bid document, if you’re really keen) can be found at <http://snapdrgn.net/about>.

Our April workshop invited a handful of representative data-holders and experts in prosopography and/or linked open data to spend two days in London discussing the SNAP:DRGN project, their own data and work, and approaches to sharing and linking prosopographical data in general. We presented a first draft of the SNAP:DRGN “Cookbook”, the guidelines for formatting a subset of prosopographical data in RDF for contribution to the SNAP graph, and received some extremely useful feedback on individual technical issues and the overall approach. A summary of the workshop, and slides from many of the presentations, can be found at <http://snapdrgn.net/archives/110>.

In the coming weeks we shall announce the first public version of the SNAP ontology, the Cookbook, and the graph of our core and partner datasets and annotations. For further discussion about the project, and linked data for prosopography in general, you can also join the Ancient-People Googlegroup (where I posted a summary similar to this post earlier today).