Archive for the ‘Standards’ Category

OEDUc: recommendations for EDH person-records in SNAP RDF

Monday, July 3rd, 2017

At the first meeting of the Open Epigraphic Data Unconference (OEDUc) in London in May 2017, one of the working groups that met in the afternoon (and claim to have completed our brief, so do not propose to meet again) examined the person-data offered for download on the EDH open data repository, and made some recommendations for making this data more compatible with the SNAP:DRGN guidelines.

Currently, the RDF of a person-record in the EDH data (in TTL format) looks like:

    a lawd:Person ;
    lawd:PersonalName "Nonia Optata"@lat ;
    gndo:gender <> ;
    nmo:hasStartDate "0071" ;
    nmo:hasEndDate "0130" ;
    snap:associatedPlace <> ,
        <> ;
    lawd:hasAttestation <> .

We identified a few problems with this data structure, and made recommendations as follows.

  1. We propose that EDH split the current person references in edh_people.ttl into: (a) one lawd:Person, which has the properties for name, gender, status, membership, and hasAttestation, and (b) one lawd:PersonAttestation, which has properties dct:Source (which points to the URI for the inscription itself) and lawd:Citation. Date and location etc. can then be derived from the inscription (which is where they belong).
  2. A few observations:
    1. Lawd:PersonalName is a class, not a property. The recommended property for a personal name as a string is foaf:name
    2. the language tag for Latin should be @la (not lat)
    3. there are currently thousands of empty strings tagged as Greek
    4. Nomisma date properties cannot be used on person, because the definition is inappropriate (and unclear)
    5. As documented, Nomisma date properties refer only to numismatic dates, not epigraphic (I would request a modification to their documentation for this)
    6. the D-N.B ontology for gender is inadequate (which is partly why SNAP has avoided tagging gender so far); a better ontology may be found, but I would suggest plain text values for now
    7. to the person record, above, we could then add dct:identifier with the PIR number (and compare discussion of plans for disambiguation of PIR persons in another working group)

Linked Data for the Humanities Workshop in Oxford

Thursday, May 21st, 2015

Via Terhi Nurmikko:

Linked Data for the Humanities Workshop: A semantic web of scholarly data
Part of the Digital Humanities Oxford Summer School, held 20th – 24th July 2015.
Book your place via

Come and learn from experts and engage with participants from around the world, from every field and career stage. Develop your knowledge and acquire new skills to support your interest in Linked Data for the Humanities. Immerse yourself in this specialist topic for a week, and widen your horizons through the keynote and additional sessions.

The Linked Data in the Humanities workshop introduces the concepts and technologies behind Linked Data and the Semantic Web and teaches attendees how they can publish their research so that it is available in these forms for reuse by other humanities scholars, and how to access and manipulate Linked Data resources provided by others. The Semantic Web tools and methods described over the week use distinct but interwoven models to represent services, data collections, workflows, and the domain of an application. Topics covered will include: the RDF format; modelling your data and publishing to the web; Linked Data; querying RDF data using SPARQL; and choosing and designing vocabularies and ontologies.

The workshop comprises a series of lectures and hands-on tutorials. Lectures introduce theoretical concepts in the context of Semantic Web systems deployed in and around the humanities, many of which are introduced by their creators. Each lecture is paired with a practical session in which attendees are guided through their own exploration of the topics covered.

Book your place via

For more information about the Digital Humanities Oxford Summmer School, see .

Citation in Digital Scholarship: A Conversation

Monday, October 4th, 2010

I’m writing to bring readers’ attention to a series of pages that is coming together on the Digital Classicist wiki under the rubric “Citation in digital scholarship” (category). I take responsibility/blame for initiating the project, but it has already benefitted from input by Matteo Romanello (author of CRefEx) and from comments by my colleagues at NYU’s Institute for the Study of the Ancient World. You’ll also see the influence of the Canonical Text Services.

A slight preview of what you’ll find there and of where this all might go:

  1. The goal is to provide a robust and simple convention for indicating that citations are present. How robust? How simple? At a bare minimum, just wrap a citation in ‘<a class=”citation” href=””>…</a>’. That will distinguish intellectually significant citations from other links (such as to a home page for the hosting project). I cribbed the  ‘class=”citation”‘ string from Matteo’s articles cited at the bottom of the wiki page. Please also consider adding a ‘title’ and ‘lang’ attribute as described.
  2. We are also interested in encouraging convergence on best practices for communicating information about the entities being cited and about the nature of the citation itself:
    1. There is a page “Citations with added RDFa” that suggests conventions for using RDFa to add markup. It encourages use of Dublin Core terms.
    2. Matteo has begun a page “Citations with CTS and Microformats“. CTS, developed by Neel Smith and Chris Blackwell, is important by way of its potential to provide stable URIs to well-known texts.

    Merging these conventions is of ongoing interest. And they do illustrate that one goal is to converge on best practices that are extendable and not in unnecessary conflict with existing work.

  3. While it isn’t represented on the wiki yet, I intend to start a javascript library that will identify citations in a page (e.g. jQuery’s “$(‘.citation’)” ) in order to present information about, along with options for following, a particular citation. Or to list and map all the dc:Location’s cited in a text. Etc.
  4. Closing the loop: this work overlaps with a meeting held by the ISAW Digital Projects Team in NYC last week. The preliminary result is a tool for managing URIs in a shared bibliographic infrastructure. This is one example of an entity that can produce embeddable markup conforming to the ‘class=”citation”‘ convention. Such markup would be consumable by the planned js library. Any project that produces stable URIs can have an “Embed a link to here.” (vel sim) widget that produces conforming html for authors to re-use.

I’m grateful to Gabriel Bodard for letting me use the Digital Classicist wiki to start these pages and for encouraging me to summarize here. The effort is inspired by the observation that a little bit of common documentation, sharing, and tool building can lead to big wins for users and developers, as well as to greater interoperability for our citation practices going forward.

Comments here are very welcome.

How to Cite e-Resources without Stable URLs

Friday, February 26th, 2010

It used to be said, especially by the Internet’s nay-sayers, that the insuperable barrier to publishing and citing online is that links are never stable. The number of pages that appear and disappear every day means that even a year-old list of sites is likely to contain significant link rot. There is a significant movement to promote both stable and cool URLs (see for example [van Kesteren 2004] and [Berners-Lee 1998]), and most of those of us who publish online take great pains to have URLs that are both predictable and will not need to change.

For example, we recently published The Inscritions of Roman Tripolitania, a digital edition based very closely on the 1952 printed volume by Reynolds and Ward-Perkins, at:

Because this is largely a reprint, there is obviously more work to be done, and we hope to add a new edition fairly soon (incorporating, for example, Arabic translation and new digital photographs). When we do so, the new site will be labeled “irt2011” or similar, all internal links will include this date, and the old site will not need to be removed or renamed. No links will be broken in this process.

Similarly, good electronic journals have URLs that reflect date and/or issue number in the directory structure:

Again we can see that they don’t need to change when new issues come along or the site is restructured. Additionally, you can guess from these URLs what the address of DM issue 5, or DHQ issue 4.2, would be. Additionally, I can remember (or guess) the URLs of individual papers within DM by their authors’ surnames. I strongly suspect that in a year and in ten years, these URLs will still work.

All of which makes is especially surprising that an institution like the Center for Hellenic Studies, which is in so many ways a field-leader and standard-setter in Digital Humanities matters, has a website whose URLs seem to be generated by a content management system. These URLs (including that of their flagship online journal Classics@, and of the magisterial Homer Multitext) are ungainly, arbitrary, and almost certainly not stable. Even worse, many individual pages within the site have URLs that contain a session-specific hash, and so cannot be cited at all:

One might argue that these pages should be cited as if they were paper publications, and readers are then left to their own devices to track them down, but surely that isn’t good enough? Are there any solutions to citing electronically, and linking to, a page whose URL is likely to be itinerant? A persistent redirect? A Zotero biblio URL that you can update if you notice it’s broken?

Host your texts on Google in one day, Jan 11, 2010

Thursday, October 8th, 2009

Workshop: Host your texts on Google in one day

The Center For Hellenic Studies will conduct a one-day workshop at the Center’s Washington, D.C., campus, on Monday, Jan. 11, 2010, with the  subject: “Host your texts on Google in one day”. Bring one or more XML texts to the workshop in the morning, and leave in the afternoon with a running Google installation of Canonical Text Services serving your texts to the internet (

For more information, including how to apply, please see

Feel free to forward this announcement to anyone who might be interested.

Metadata Workshops in Michigan, 7 May 2009

Wednesday, February 25th, 2009

Aimed at Medievalists but may also be of interest to Classicists…

The Medieval Academy of America’s Committee on Electronic Resources is pleased to announce two workshops to be held at the International Medieval Congress, Kalamazoo, MI, in May 2009. Both workshops will be on Thursday, May 7 (sessions 54 and 166; see for complete conference schedule).

Workshop registration online at

1) Metadata for Medievalists I: Introduction to Metadata Formats
Session 54, Thursday 7 May, 10am

This workshop offers an introduction to best practices for digital scholarship, led by Sheila Bair, Western Michigan University’s Metadata Librarian. Instruction includes an introduction to the concept of metadata, an overview of metadata types of interest to medievalists working in a variety of textual and image formats, and an overview of methods for metadata implementations (database, encoded data, printed copy, etc.). Assignments will be completed during the following clinic.

2) Metadata for Medievalists II: Introduction to the Text-Encoding Initiative
Session 166, Thursday 7 May, 3:30pm

This workshop offers an introduction to best practices for digital scholarship, taught by a medievalist, Dot Porter, specifically for medievalists. Instruction includes introductory-level XML and structural encoding, as well as TEI P5 standards and guidelines, markup concerns for medieval transcription, and a brief consideration of XML Editors. Assignments will be completed during the following clinic.

Sheila Bair is the Metadata Librarian at Western Michigan University and holds an MS in Library Science from the University of Wisconsin-Milwaukee. Dot Porter is the Metadata Manager at the Digital Humanities Observatory, Royal Irish Academy, in Dublin, Ireland. She has an MA in Medieval Studies from Western Michigan University and an MS in Library Science from UNC Chapel Hill, and extensive experience in text encoding in the medieval studies and classics.

Both workshops are limited to 35 participants, and registration is required.

The pre-registration fee per workshop for students is $40/$55 (Medieval Academy members/nonmembers), for non-students is $50/$65.

To register, complete the online form at
Questions about registration should be directed to James W. Brodman at jimb[at]
Questions about the workshops should be directed to Dot Porter at dot.porter[at]

The Digital Archimedes Palimpsest Released

Wednesday, October 29th, 2008

Very exciting news – the complete dataset of the Archimedes Palimpsest project (ten years in the making) has been released today. The official announcement is copied below, but I’d like to point out what I think it is that makes this project so special. It isn’t the object – the manuscript – or the content – although I’m sure the previously unknown texts are quite exciting for scholars. It isn’t even the technology, which includes multispectral imaging used to separate out the palimpsest from the overlying text and the XML transcriptions mapped to those images (although that’s a subject close to my heart).

What’s special about this project is its total dedication to open access principles, and an implied trust in the way it is being released that open access will work. There is no user interface. Instead, all project data is being released under a Creative Commons 3.0 attribution license. Under this license, anyone can take this data and do whatever they want to with it (even sell it), as long as they attribute it to the Archimedes Palimpsest project. The thinking behind this is that, by making the complete project data available, others will step up and build interfaces… create searches… make visualizations… do all kinds of cool stuff with the data that the developers might not even consider.

To be fair, this isn’t the only project I know of that is operating like this; the complete high-resolution photographs and accompanying metadata for manuscripts digitized through the Homer Multitext project are available freely, as the other project data will be when it’s completed, although the HMT as far as I know will also have its own user interface. There may be others as well. But I’m impressed that the project developers are releasing just the data, and trusting that scholars and others will create user environments of their own.

The Stoa was founded on principles of open access. It’s validating to see a high-visibility project such as the Archimedes Palimpsest take those principles seriously.

Ten years ago today, a private American collector purchased the Archimedes Palimpsest. Since that time he has guided and funded the project to conserve, image, and study the manuscript. After ten years of work, involving the expertise and goodwill of an extraordinary number of people working around the world, the Archimedes Palimpsest Project has released its data. It is a historic dataset, revealing new texts from the ancient world. It is an integrated product, weaving registered images in many wavebands of light with XML transcriptions of the Archimedes and Hyperides texts that are spatially mapped to those images. It has pushed boundaries for the imaging of documents, and relied almost exclusively on current international standards. We hope that this dataset will be a persistent digital resource for the decades to come. We also hope it will be helpful as an example for others who are conducting similar work. It published under a Creative Commons 3.0 attribution license, to ensure ease of access and the potential for widespread use. A complete facsimile of the revealed palimpsested texts is available on Googlebooks as “The Archimedes Palimpsest”. It is hoped that this is the first of many uses to which the data will be put.

For information on the Archimedes Palimpsest Project, please visit:

For the dataset, please visit:

We have set up a discussion forum on the Archimedes Palimpsest Project. Any member can invite anybody else to join. If you want to become a member, please email:

I would be grateful if you would circulate this to your friends and colleagues.

Thank you very much

Will Noel
The Walters Art Museum
October 29th, 2008.

In defence of biblioclasm

Saturday, September 13th, 2008

Charlotte Roueché pointed me to this transcript of a piece from ABC Radio’s Perspective slot: ‘Our Biblioclastic Century‘. The author, Robin Derricourt, an academic publisher with a background in archaeology and history, makes some well-observed points about online publication and the need for sustainability of publication and citation if we are to retain the intellectual and academic output of our culture. With none of this can I disagree. However, he then ends this short, pithy piece with the somewhat knee-jerk conclusion:

I know that my grandchildren will be able to go into a library and read an article by Einstein, a book by Newton, or a manuscript by Captain James Cook, and those by their minor contemporaries. I do not know that they will be able to access the reports, documents and articles that I can read today only on some present day institution’s website. In fact I can be pretty sure most of this will not survive.

And when our own civilisation finally ends, as each civilisation does, where will be the repository that maintains what we now have as knowledge, perhaps even through some future dark ages, for later societies to inherit? They will still have Aristotle, or Darwin, but they may not have the 21st century equivalents to read.

It is important to recognise that this is the well-thought out fear of an informed and intelligent person, and that those of us working for digital sustainability therefore need to communicate our aims and achievements more widely. I cannot help, however, but point out a logical fallacy in this argument: Derricourt assumes the existence of the physical library full of books (as well he might, the library is an institution that will not go away any time soon). But the library has not always existed, and it was by no means automatic or self-evident that the library would come into existence.

If these cultural and academic institutions had not come into being at several points in history (often associated with the courts of kings or religious communities), then books would be in no better shape that websites are now (or rather websites in the world that still exists in Derricourt’s imagination, which was the world of the early Web of the 1990s). Individual copies would have circulated in private collection, some would occasionally have been copied, but not on the scale and with the rigour that we saw in Mediaeval monasteries, for example. The idea of the repository that holds a copy of everything published in a certain domain, whatever its perceived worth, would not exist. A private collection or library could easily be burned or thrown into the trash at the end of its owner’s life, or when moving residence (and not all trash-heaps are as future-friendly as the sand at Oxyrhynchus). The library changed all this, and thanks to the libraries and scriptoria, and later printing houses and repositories, copies were made and works were preserved in multiple places, on durable materials, and with rigorous standards.

On the Web, some might say, we do not have libraries to do this job for us, and so when one private collection (a privately registered web domain, say) disappears due to its owner moving residence or losing interest or failing to keep up payments on the domain registration or service provision, all will be lost. Irrevocably and permanently. (No great loss, others would argue.) However this is not true. There are libraries in the online world. There are digital archives and repositories; the Internet Archive and various search engine caches (among other entities) may be able to recover the lost website from 1998 that Derricourt mourns. Digital libraries set out to make multiple, well-archived, backed-up copies, in open standards and formats and registered with Digital Object Identifiers, of all works in their purview. In short, there are libraries on the web. And it is not therefore true that, as Derricourt argues:

Let’s be realistic – all [sc. online content] will disappear, because no web site is permanent. Only a physical library can maintain and transmit to future generations our heritage of ideas, knowledge, discovery, speculation, literature. I can more easily find an 1898 print article than a 1998 document published on the Web.

In fact, as the world becomes more connected and the Internet becomes the source and the repository for more and more of our information, libraries are going to come under increasing pressure to cut back their accessions, to digitize and archive (or even destroy) their paper collections, and to become custodians of digital rather than physical artefacts. (Don’t get me wrong: I will be in the front line of the fight to defend libraries against this offensive, but the pressure will be there.) It is by no means automatic that physical libraries will always be the best source of cultural and literary preservation in our grandchildren’s time. If no one has bothered to digitize even a 2008 print article, then the 1998 website will be easier to find in one hundred years time. I don’t fear for websites. I fear for paper archives that no one is digitizing.

How to cite Creative Commons works

Saturday, August 16th, 2008

A very useful guide is being compiled by Molly Kleinman in her Multi-Purpose Librarian blog. As someone who licenses a lot of work using CC-BY, and who both re-uses and sometimes re-mixes a lot of CC work (especially photographs) for both academic and creative ends, I recognise that it isn’t always clear exactly what “attribution” means, for example. Kleinman gives examples of ideal and realistic usage (the real name of a copyright-holder and/or title of a work may not always been known, say), and makes suggestions for good practice and compromises. This is a very welcome service, and I hope that more examples and comments follow.

Internet Archaeology 24: Dealing with Legacy Data

Tuesday, July 8th, 2008

Internet Archaeology announces issue 24, a themed issue dedicated to: “Dealing with Legacy Data” edited by Pim Allison.

In the Mediterranean region particularly, but by no means not exclusively, there exist large datasets from previous excavations, published and unpublished, whose digitisation, spatial mapping and re-analysis can greatly facilitate investigations of social behaviour and changing environmental conditions. This volume presents a number of projects that demonstrate the usefulness of digital environments for analysing such non-digital data. These projects use these ‘legacy data’ within true GIS, pseudo-GIS, or other digital environments to answer specific questions concerning social behaviour and particularly the social use of space.

Several papers of interest to Classicists (as well as all to Digital Humanists).

EpiDoc Summer School, July 14th-18th, 2008

Wednesday, April 23rd, 2008
The Centre for Computing in the Humanties, Kings College London, is again offering an EpiDoc Summer School, on July 14th-18th, 2008. The training is designed for epigraphers or papyrologists (or related text editors such as numismatists, sigillographers, etc.) who would like to learn the skills and tools required to mark up ancient documents for publication (online or on paper), and interchange with international academic standards.You can learn more about EpiDoc from the EpiDoc home page and the Introduction for Epigraphers; you wil find a recent and user-friendly article on the subject in the Digital Medievalist. (If you want to go further, you can learn about XML and about the principles of the TEI: Text Encoding Initiative.) The Summer School will not expect any technical expertise, and training in basic XML will be provided.

Attendees (who should be familiar with Greek/Latin and the Leiden Conventions) will need to bring a laptop on which has been installed the Oxygen XML editor (available at a reduced academic price, or for a free 30-day demo).

The EpiDoc Summer School is free to participants; we can try to help you find cheap (student) accommodation in London. If any students participating would like to stay on afterwards and acquire some hands-on experience marking up some texts for the Inscriptions of Roman Cyrenaica project, they would be most welcome!

All interested please contact both and as soon as possible. Please pass on this message to anyone who you think might benefit.

Report on NEH Workshop “Supporting Digital Scholarly Editions”

Friday, April 4th, 2008

The official report on the NEH Workshop “Supporting Digital Scholarly Editions”, held on January 14, has been released and is available in PDF form:

Attendees included representatives from funding agencies and university presses, historians, just one or two literary scholars, one medievalist, and no classicists. It appears that much of the discussion focused on creating a service provider for scholarly editions, something to work between scholars and university presses to turn scholarship into digital publications.

I’m of two minds about this. On one hand, I know a lot of “traditional scholars” who find the idea of digital publication a little scary, just the idea of having to learn the technology. So it could be a good way to bring digital publication into the mainstream. But on the other hand, this kind of model could be stifling for creativity. One of the exciting things about digital projects is that, at this time, although there are standards there is no single model to follow for publication. There’s a lot of room for experimentation. It’s certainly not either/or – those of us doing more cutting-edge work will continue to do it whether there are mainstream service providers at university presses or not. But it’s interesting that this is being discussed.

Rieger, Preservation in the Age of Large-Scale Digitization

Sunday, March 2nd, 2008

CLIR (the Council on Library and Information Resources in DC) have published in PDF the text of a white paper by Oya Rieger titled ‘Preservation in the Age of Large-Scale Digitization‘. She discusses large-scale digitization initiatives such as Google Books, Microsoft Live, and the Open Content Alliance. This is more of a diplomatic/administrative than a technical discussion, with questions of funding, strategy, and policy rearing higher than issues of technology, standards, or protocols, the tension between depth and scale (all of which were questions raised during our Open Source Critical Editions conversations).

The paper ends with thirteen major recommendations, all of which are important and deserve close reading, and the most important of which is the need for collaboration, sharing of resources, and generally working closely with other institutions and projects involved in digitization, archiving, and preservation.

One comment hit especially close to home:

The recent announcement that the Arts and Humanities Research Council and Joint Information Systems Committee (JISC) will cease funding the Arts and Humanities Data Service (AHDS) gives cause for concern about the long-term viability of even government-funded archiving services. Such uncertainties strengthen the case for libraries taking responsibility for preservation—both from archival and access perspectives.

It is actually a difficult question to decide who should be responsible for long-term archiving of digital resources, but I would argue that this is one place where duplication of labour is not a bad thing. The more copies of our cultural artefacts that exist, in different formats, contexts, and versions, the more likely we are to retain some of our civilisation after the next cataclysm. This is not to say that coordination and collaboration are not desiderata, but that we should expect, plan for, and even strive for redundancy on all fronts.

(Thanks to Dan O’Donnell for the link.)

Music in TEI SIG

Sunday, February 10th, 2008

The TEI community have just set up a Special Interest Group for the encoding of music in XML (disclosure: I am one of the moderators). I forward the announcement below:

A Special Interest Group for music encoding in TEI has been created. The goal of the SIG is to examine the current possibilities for encoding both the physical representation of music and the aural common elements between different notation systems, and to decide on a preliminary recommendation/agenda for music encoding in the TEI, whether directly via adoption of new elements or by importing a recommended namespace from an existing external schema.

The discussion will deal with issues like:

  • Encoding western music notation from all time periods, from ancient through modern.
  • Encoding not only the music notation, but the aural aspects common to different notation systems.
  • Encoding music and text together as well as music on its own.

Everyone interested is welcome to participate to our mailing list:
and to our wiki:

It is particularly important, I think, that experts in ancient music are represented in this discussion, since many of the participants in the TEI community are mediaeval or modern manuscript scholars. There may (there surely *will*) be features of ancient music that test the limits of standards designed to encode more modern musical notations.

(Does anyone have any nice musical papyri lying around that we could encode in EpiDoc as a test of this sort of markup?)

Information Behaviour of the Researcher of the Future (report)

Sunday, January 20th, 2008

The British Library and JISC commissioned the Centre for Information Behaviour and the Evaluation of Research (CIBER) at UCL to produce a report on Information Behaviour of the Researcher of the Future. It’s well worth reading the full reportin PDF (which I haven’t finished yet) but among the conclusions listed by the BL press release on this are:

  • All age groups revealed to share so-called ‘Google Generation’ traits
  • New study argues that libraries will have to adapt to the digital mindset
  • Young people seemingly lacking in information skills; strong message to the government and society at large

A new study overturns the common assumption that the ‘Google Generation’ – youngsters born or brought up in the Internet age – is the most web-literate. The first ever virtual longitudinal study carried out by the CIBER research team at University College London claims that, although young people demonstrate an apparent ease and familiarity with computers, they rely heavily on search engines, view rather than read and do not possess the critical and analytical skills to assess the information that they find on the web.

This is a very interesting combination of conclusions–although many of us have been observing for years that while our youngest students may think they know everything about computers they often don’t actually know the first thing about using the Internet for research (nor, needless to say, about opening up a computer–either physically or metaphorically–and playing with its innards). That the GoogleGen traits such as short attention span, impatience with anything not in the first page of search results, and readiness to flit from topic to topic in the wikiblogoatomosphere are not restricted to teenagers is not news to we “gray ADDers” either.

The suggestion that libraries, the ultimate custodians of both raw data and interpreted information (and, I would argue, especially schools and universities), need to be functioning in the spirit of this new digital world and serving the needs of our plugged-in and distracted community. Not by making information available in bite-sized, easily identified and digested pieces–that would be pandering, not serving–but by providing educational resources alongside the traditional preserved texts/media. And microformatting it (because our target-audience don’t necessarily know they’re our audience). And future-proofing it.

Tutorial: The CIDOC Conceptual Reference Model

Tuesday, January 15th, 2008

Noted by way of JISC-REPOSITORIES:

DCC Tutorial: The CIDOC Conceptual Reference Model – A New Standard for Knowledge Sharing
January 29 2008
University of Glasgow

The DCC and FORTH are delighted to announce that they will be delivering a joint one-day tutorial on the CIDOC Conceptual Reference Model.

This tutorial will introduce the audience to the CIDOC Conceptual Reference Model, a core ontology and ISO standard (ISO 21127) for the semantic integration of cultural information with library, archive and other information. The CIDOC CRM concentrates on the definition of relationships, rather than terminology, in order to mediate between heterogeneous database schemata and metadata structures. This led to a compact model of 80 classes and 130 relationships, easy to comprehend and suitable to serve as a basis for mediation of cultural and library information and thereby provide the semantic ‘glue’ needed to transform today’s disparate, localised information sources into a coherent and valuable global resource. It comprises the concepts characteristic for data structures employed by most museum, archive and library documentation. Its central idea is the explicit modelling of events, both for the representation of metadata, such as creation, publication, and use, as well as for content summarization and the creation of integrated knowledge bases. It is not prescriptive, but provides a framework to describe common high-level semantics that allow for information integration at the schema level for a wide area of domains.

The CIDOC CRM, as an effort of the museums community, is paralleled by the Functional Requirements for Bibliographic Records (FRBR) by IFLA for the librarians community. Both Working Groups have come together since 2003 and started to develop a common harmonized model. The first draft version is now available as a compatible extension of the CRM, the ooFRBR, covering equally libraries and museums.

The tutorial aims at rendering the necessary knowledge to understand the potential of applying the CRM – where it can be useful and what the major technical issues of an application are. It will present an overview of the concepts and relationships covered by the CRM. As an example of a simple application, it will present the CRM Core Metadata Element Set, a minimal metadata schema of about 20 elements, still compatible with the CRM, and demonstrate how even this simple schema can be used to create large networks of integrated knowledge about physical and digital objects, persons, places and events. As an example of a simple compatible extension, it will present the core model of digitization processes used in the CASPAR project to describe digital provenance.

In part two, the tutorial will present in detail the draft ooFRBR Model. This model describes in detail the intellectual creation process from the first conception to the publishing in industrial form such as books or electronically. It should be considered equally interesting for the digital libraries community, and it is a fine example of the extensibility of the CRM for dedicated domains.
There will be enough time for questions and discussion.

Martin Doerr, Information Systems Lab, Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Vassilika Vouton.

Target audience: Ontology experts, digital library designers, data warehouse designers, system integrators, portal designers that work in the wider area of cultural and library information, but also IT-Staff of libraries, museums and archives, vendors of cultural and other information systems. Basic knowledge of object-oriented data models is required.

Duration: Part one: 3 hours
Part two: 1.5 hours
Cost: £50 for DCC Associate Network members and £75 for non members.

If you are interested in taking part, please email Please feel free to forward this message on to any interested parties.

Long-term data preservation

Friday, December 14th, 2007

There was an article in New Scientist last week on plans for permanent data preservation for the scientific data. The argument in the sciences seems to be that all data should be preserved, as some of it will be from experiments that are unrepeatable (in particular Earth observations, astronomy, particle accelerators, and other highly expensive projects that can produce petabytes of data). It is a common observation that any problems we have in the humanities, the sciences have in spades and will solve for us, but what is interesting here is that the big funding being thrown at this problem by the likes of the NSF, ESF, and the Alliance for Permanent Access is considered news. This is a recognised problem, and the sciences don’t have the solution yet… Grid and Supercomputing technologies are still developing.

(Interestingly, I have heard the argument made in the humanities that on the contrary, most data is a waste of space and should be thrown away because it will just make it more difficult for future researchers to find the important stuff among all the crap. Even in the context of archaeology, where one would have thought practitioners would be sensitive to the fragile nature of the materials and artefacts that we study, there is a school of thought that says our data–outside of actual publications–are just not important enough to preserve in the long term. Surely in the Googleverse finding what you want in a vast quantity of information is a problem with better solutions than throwing out stuff that you don’t think important and therefore cannot imagine anyone else finding interesting.)

Another important aspect of the preservation article is the observation that:

Even if the raw data survives, it is useless without the background information that gives it meaning.

We have made this argument often in Digital Humanities venues: raw data is not enough, we also need the software, the processing instructions, the script, presentation, search, and/or transformation scenarios that make this data meaningful for our interpretations and publications. This is in technical terms the equivalent of documenting experimental methodology to make sure that research results can be replicated, but it also as essential and providing binary data and documenting the format so that this data can be interpreted as structured text (say).

It’s good to see that this is a documented issue and that large resources are being thrown at it. We shall watch their progress with great interest.

In mediis tutissimus ibis

Friday, September 14th, 2007

Even if it’s true that “People just want to see what’s real,” surely there are times when one needs to step back from technology’s bleeding edge.

Promise and challenge: augmenting places with sources

Tuesday, June 19th, 2007

Bill Turkel has some very interesting things to say about “the widespread digitization of historical sources” and — near and dear to my heart — “augmenting places with sources”:

The last paragraph in “Seeing There” resonated especially, given what we’re trying to do with Pleiades:

The widespread digitization of historical sources raises the question of what kinds of top-level views we can have into the past. Obviously it’s possible to visit an archive in real life or in Second Life, and easy to imagine locating the archive in Google Earth. It is also possible to geocode sources, link each to the places to which it relates or refers. Some of this will be done manually and accurately, some automatically with a lower degree of accuracy. Augmenting places with sources, however, raises new questions about selectivity. Without some way of filtering or making sense of these place-based records, what we’ll end up with at best will be an overview, and not topsight.

There’s an ecosystem of digital scholarship building. And I’m not talking about SOAP, RDF or OGC. I’m talking about generic function and effect …  Is your digital publication epigraphic? Papyrological? Literary? Archaeological? Numismatic? Encyclopedic? A lumbering giant library book hoover? Your/my data is our/your metadata (if we/you eschew walls and fences). When we all cite each other and remix each other’s data in ways that software agents can exploit, what new visualizations/abstractions/interpretations will arise to empower the next generation of scholarly inquiry? Stay tuned (and plug in)!

Withdrawal of AHDS Funding

Saturday, June 2nd, 2007

Following the recent public announcement that the UK’s AHRC intends to withdraw funding from the Arts and Humanities Data Service, the following petition has been set up at the British Government’s website.

On 11 May 2007, Professor Phillip Esler, Chief Executive of the AHRC, wrote to University Vice-Chancellors informing them of the Council’s decision to withdraw funding from the AHDS after eleven years. The AHDS has pioneered and encouraged awareness and use among Britain’s university researchers in the arts and humanities of best practice in preserving digital data created by research projects funded by public money. It has also ensured that this data remains publically available for future researchers. It is by no means evident that a suitable replacement infrastructure will be established and the AHRC appears to have taken no adequate steps to ensure the continued preservation of this data. The AHDS has also played a prominent role in raising awareness of new technologies and innovative practices among UK researchers. We believe that the withdrawal of funding for this body is a retrograde step which will undermine attempts to create in Britain a knowledge economy based on latest technologies. We ask the Prime Minister to urge the AHRC to reconsider this decision.

You must be a British citizen or resident to sign the petition.

Google Earth with Audio

Sunday, May 13th, 2007

This interesting post over at New Scientist Tech:

Bernie Krause has spent 40 years collecting over 3500 hours of sound recordings from all over the world, including bird and whale song and the crackle of melting glaciers. His company, Wild Sanctuary in Glen Ellen, California, has now created software to embed these sound files into the relevant locations in Google Earth. Just zoom in on your chosen spot and listen to local sounds.

“Our objective is to bring the world alive,” says Krause. “We have all the continents of the world, high mountains and low deserts.”

He hopes it will make virtual visitors more aware of the impact of human activity on the environment in the years since he began making and collecting the recordings. Users will be able to hear various modern-day sounds at a particular location, then travel back in time to compare them with the noises of decades gone by.

This is more than just a cool mashup of sounds with locations; the idea has repercussions in all sorts of departments, not least technical. At the end of the NS article is a note:

Another project, called Freesound, is making contributors’ sound files available on Google Earth. Unlike these recordings, Krause’s sound files are of a consistent quality and enriched with time, date and weather information.

Freesound is a Creative Commons site and more interesting from the Web 2.0 perspetive, as content is freely user-generated. What is exciting is the way that sites can make all sorts of media available through georeferences in Google Earth/Maps now (as for example the Pleiades Project are doing with classical sites). The question will be how such rich results are filtered: will Google provide overlays that filter by more than just keywords, or will third-party sites (like Wild Sanctuary and Pleiades) need to create web services that take advantage of the open technologies but provide their own filters? (Tom can probably answer these questions already…)

Stop teaching historians to use computers!

Tuesday, May 8th, 2007

Bill Turkel has started what looks to be an important and potentially influential thread on the nexus of history and the digital. His opening salvo:

Teaching history students how to use computers was a really good idea in the early 1980s. It’s not anymore. Students who were born in 1983 have already graduated from college. If they didn’t pick up the rudiments of word processing and spreadsheet and database use along the way, that’s tragic. But if we concentrate on teaching those things now, we’ll be preparing our students for the brave new world of 1983.

Posts so far:

International Network of Digital Humanities Centers

Tuesday, April 24th, 2007

Making the rounds on various lists this morning is a call for participation in “an international network of digital humanities centers.” Julia Flanders et al. write:

If you represent something that you would consider a digital humanities center, anywhere in the world, we are interested in including you in a developing network of such centers.  The purpose of this network is cooperative and collaborative action that will benefit digital humanities and allied fields in general, and centers as humanities cyberinfrastructure in particular.  It comes out of a meeting hosted by the National Endowment for the Humanities and the University of Maryland, College Park, April 12-13, 2007 in Washington, D.C., responding in part to the report of the American Council of Learned Societies report on Cyberinfrastructure for the Humanities and Social Sciences, published in 2006.

The rest of the message, including contact information for response, follows here …


Microsoft, Google, and Yahoo support GeoRSS

Friday, April 6th, 2007

Seen in Slashdot:

This week, Microsoft announced their new Live Maps, in addition to supporting Firefox on Windows for 3D, now supports the GeoRSS standard. They join Google which recently announced the support of GeoRSS and KML mapping in their Google Maps API. In short, GeoRSS is a standard supported by the Open Geospatial Consortium that incorporates geolocation in an interoperable manner to RSS feeds. The applications are numerous. With Yahoo!’s support of GeoRSS, all the major players are in and the future looks bright for this emerging standard. As for KML, Google Earth’s file format, this new Google Maps integration is not unrelated to the recent announcement of internet-wide KML search capabilities within Google Earth. From the GeoRSS website:

‘As RSS becomes more and more prevalent as a way to publish and share information, it becomes increasingly important that location is described in an interoperable manner so that applications can request, aggregate, share and map geographically tagged feeds. To avoid the fragmentation of language that has occurred in RSS and other Web information encoding efforts, we have created this site to promote a relatively small number of encodings that meet the needs of a wide range of communities.’