Archive for August, 2017

OEDUc: Exist-db mashup application

Wednesday, August 2nd, 2017

Exist-db mashup application working

This working group has worked to develop a demo app built with exist-db, a natively XML database which uses XQuery.

The app is ugly, but was built reusing various bits and pieces in a bit less than two days (the day of the unconference and a bit of the following day) and it uses different data sources with different methods to bring together useful resources for an epigraphic corpus and works in most of the cases for the examples we wanted to support. This was possible because exist-db makes it possible and because there were already all the bits available (exist-db, the xslt, the data, etc.)

Code, without data, has been copied to .

The app is accessible, with data from EDH data dumps of July at

Preliminary twicks to the data included:

  • Adding an @xml:id to the text element to speed up retrival of items in exist. (the xquery doing this is in the AddIdToTextElement.xql file)
  • Note that there is no Pleiades id in the EDH XML (or in any EAGLE dataset), but there areĀ Trismegistos Geo ID! This is because it was planned during the EAGLE project to get all places of provenance in Trismegistos GEO to map them later to Pleiades. This was started using Wikidata mix’n’match but is far from complete and is currently in need for update.

The features

  • In the list view you can select an item. Each item can be edited normally (create, update, delete)
  • The editor that updates files reproduces in simple XSLT a part of the Leiden+ logic and conventions for you to enter data or update existing data. It validates the data after performing the changes against the tei-epidoc.rng schema. Plan is to have it validate before it does the real changes.
  • The search simply searches in a number of indexed elements. It is not a full text index. There are also range indexes set to speed up the queries beside the other indexes shipped with exist.
  • You can create a new entry with the Leiden+ like editor and save it. it will be first validated and in case is not ok you are pointed to the problems. There was not enough times to add the vocabularies and update the editor.
  • Once you view an item you will find in nasty hugly tables a first section with metadata, the text, some additional information on persons and a map:
  • The text exploits some of the parameters of the EpiDoc Stylesheets. You can
    change the desired value, hit change and see the different output.
  • The ids of corresponding inscriptions, are pulled from the EAGLE ids API here in Hamburg, using Trismegistos data. This app will be soon moved to Trismegistos itself, hopefully.
  • The EDH id is instead used to query the EDH data API and get the information about persons, which is printed below the text.
  • For each element with a @ref in the XML files you will find the name of the element and a link to the value. E.g. to link to the EAGLE vocabularies
  • In case this is a TM Geo ID, then the id is used to query Wikidata SPARQL endpoint and retrive coordinates and the corresponding Pleiades id (given those are there). Same logic could be used for VIAF, geonames, etc. This task is done via a http request directly in the xquery powering the app.
  • The Pleiades id thus retrieved (which could be certainly obtained in other ways) is then used in javascript to query Pelagios and print the map below (taken from the hello world example in the Pelagios repository)
  • At and two rest XQ function provide the ttl files for Pelagios (but not a dump as required, although this can be done). The places annotations, at the moment only for the first 20 entries. See rest.xql.

Future tasks

For the purpose of having a sample app to help people get started with their projects and see some of the possibilities at work, beside making it a bit nicer it would be useful if this could also have the following:

  • Add more data from EDH-API, especially from edh_geography_uri which Frank has added and has the URI of Geo data; adding .json to this gets the JSON Data of place of finding, which has a “edh_province_uri” with the data about the province.
  • Validate before submitting
  • Add more support for parameters in the EpiDoc example xslt (e.g. for Zotero bibliography contained in div[@type=’bibliography’])
  • Improve the upconversion and the editor with more and more precise matchings
  • Provide functionality to use xpath to search the data
  • Add advanced search capabilities to filter results by id, content provider, etc.
  • Add images support
  • Include all EAGLE data (currently only EDH dumps data is in, but the system scales nicely)
  • Include query to the EAGLE media wiki of translations (api currently unavailable)
  • Show related items based on any of the values
  • Include in the editor the possibility to tag named entities
  • Sync the Epidoc XSLT repository and the eagle vocabularies with a webhook