Services and Infrastructure for a Million Books (round table)

March 17th, 2008 by Gabriel Bodard

Million Books Workshop, Friday, March 14, 2008, Imperial College London.

The second of two round tables in the afternoon of the Million Books Workshop, chaired by Brian Fuchs (Imperial College London), asked a panel of experts what services and infrastructure they would like to see in order to make a Million Book corpus useful.

  1. Stuart Dunn (Arts and Humanities e-Science Support Centre): the kinds of questions that will be asked of the Million Books mean that the structure of this collection needs to be more sophisticated that just a library catalogue
  2. Alistair Dunning (Archaeological Data Service & JISC): powerful services are urgently needed to enable humanists both to find and to use the resources in this new collection
  3. Michael Popham (OULS but formerly director of e-Science Centre): large scale digitization is a way to break down the accidental constraints of time and place that limit access to resources in traditional libraries
  4. David Shotton (Image Bioinformatics Research Group): emphasis is on accessibility and the semantic web. It is clear than manual building of ontologies does not scale to millions of items, therefore data mining and topic modelling are required, possible assisted by crowdsourcing. It is essential to be able to integrate heterogeneous sources in a single, semantic infrastructure
    1. Dunning: citability and replicability of research becomes a concern with open publication on this scale
    2. Dunn: the archaeology world has similar concerns, cf. the recent LEAP project
  5. Paul Walk (UK Office for Library and Information Networking): concerned with what happens to the all-important role of domain expertise in this world of repurposable services: where is the librarian?
    1. Charlotte Roueché (KCL): learned societies need to play a role in assuring quality and trust in open publications
    2. Dunning: institutional repositories also need to play a role in long-term archiving. Licensing is an essential component of preservation—open licenses are required for maximum distribution of archival copies
    3. Thomas Breuel (DFKI): versioning tools and infrastructure for decentralised repositories exist (e.g. Mercurial)
    4. Fuchs: we also need mechanisms for finding, searching, identifying, and enabling data in these massive collections
    5. Walk: we need to be able to inform scholars when new data in their field of interest appears via feeds of some kind

(Disclaimer: this is only one blogger’s partial summary. The workshop organisers will publish an official report on this event.)

Leave a Reply