Archive for the ‘Tools’ Category

EpiDoc Latest Release (8.17)

Thursday, August 8th, 2013

Scott Vanderbilt has just announced the latest release of the EpiDoc Guidelines, Schema, and Example Stylesheets.

Details are available on the Latest Release page of the EpiDoc wiki at SourceForge.

Perseus Catalog Released

Friday, June 21st, 2013

From Lisa Cerrato via the Digital Classicist List:

The Perseus Digital Library is pleased to announce the 1.0 Release of the Perseus Catalog.

The Perseus Catalog is an attempt to provide systematic catalog access to at least one online edition of every major Greek and Latin author (both surviving and fragmentary) from antiquity to 600 CE. Still a work in progress, the catalog currently includes 3,679 individual works (2,522 Greek and 1,247 Latin), with over 11,000 links to online versions of these works (6,419 in Google Books, 5,098 to the Internet Archive, 593 to the Hathi Trust). The Perseus interface now includes links to the Perseus Catalog from the main navigation bar, and also from within the majority of texts in the Greco-Roman collection.

The metadata contained within the catalog has utilized the MODS and MADS standards developed by the Library of Congress as well as the Canonical Text Services and CTS-URN protocols developed by the Homer Multitext Project.  The Perseus catalog interface uses the open source Blacklight Project interface and Apache Solr. Stable, linkable canonical URIs have been provided for all textgroups, works, editions and translations in the Catalog for both HTML and ATOM output formats. The ATOM output format provides access to the source CTS, MODS and MADS metadata for the catalog records. Subsequent releases will make all catalog data available as RDF triples.

Other major plans for the future of the catalog include not only the addition of more authors and works as well as links to online versions but also to open up the catalog to contributions from users. Currently the catalog does not include any user contribution or social features other than standard email contact information but the goal is to soon support the creation of user accounts and the contribution of recommendations, corrections and or new metadata.

The Perseus Catalog blog features documentation, a user guide, and contact information as well as comments from Editor-in-Chief Gregory Crane on the history and purpose of the catalog.

The Perseus Digital Library Team
contact: perseus_catalog@tufts.edu

Open Philology Project Announced

Thursday, April 4th, 2013

Via Marco Büchler, Greg Crane has just posted “The Open Philology Project and Humboldt Chair of Digital Humanities at Leipzig” at Perseus Digital Library Updates.

Abstract: The Humboldt Chair of Digital Humanities at the University of Leipzig sees in the rise of Digital Technologies an opportunity to re-assess and re-establish how the humanities can advance the understanding of the past and to support a dialogue among civilizations. Philology, which uses surviving linguistic sources to understand the past as deeply and broadly as possible, is central to these tasks, because languages, present and historical, are central to human culture. To advance this larger effort, the Humboldt Chair focuses upon enabling Greco-Roman culture to realize the fullest possible role in intellectual life. Greco-Roman culture is particularly significant because it contributed to both Europe and the Islamic world and the study of Greco-Roman culture and its influence thus entails Classical Arabic as well as Ancient Greek and Latin. The Humboldt Chair inaugurates an Open Philology Project with three complementary efforts that produce open philological data, educate a wide audience about historical languages, and integrate open philological data from many sources: the Open Greek and Latin Project organizes content (including translations into Classical Arabic and modern languages); the Historical Language e-Learning Project explores ways to support learning across barriers of language and culture as well as space and time; the Scaife Digital Library focuses on integrating cultural heritage sources available under open licenses.

Details of the project, its components, and rationale are provided in the original post.

Diccionario Griego-Español online

Friday, December 21st, 2012

Forwarded for Sabine Arnaud-Thuillier:

The members of the Diccionario Griego-Español project (DGE, CSIC, Madrid) are pleased to announce the release of DGE online (http://dge.cchs.csic.es/xdge/), first digital edition of the published section (α-ἔξαυος) of our Lexicon. Although still in progress, the DGE, written under the direction of Prof. F.R. Adrados, is currently becoming the largest bilingual dictionary of ancient Greek: it already includes about 60,000 entries and 370,000 citations of ancient authors and texts. Simultaneously, we are releasing the edition of LMPG online(http://dge.cchs.csic.es/lmpg/), the digital version of the Lexicon of Magic and Religion in the Greek Magical Papyri, written by Luis Muñoz Delgado (Supplement V of DGE). The digitization of this smaller Lexicon is considered as a successful prototype of this ambitious digitization initiative: further on DGE online will be improved with similar advanced features, such as the implementation of a customized search engine. Any critics and suggestions on that matter will be very welcome. We hope these new open access dictionaries will be of your interest and will become, to some extent, valuable tools for Ancient Greek studies.

Juan Rodríguez Somolinos (Main Researcher) and Sabine Arnaud-Thuillier (responsible for the digital edition)
juan.rodriguez@cchs.csic.es
sabine.thuillier@cchs.csic.es

Workshop on Canonical Text Services: Furman May 19-22, 2013

Tuesday, December 18th, 2012

Posted for Christopher Blackwell:

What · With funding from the Andrew W. Mellon Foundation, Furman University’s Department of Classics is offering a workshop on the Canonical Text Services Protocol.

When · May 19 – 22, 2013.

Where · Greenville, South Carolina, (Wikipedia); Furman University.

Who · Applications will be accepted from anyone interested in learning about exposing canonically cited texts online with CTS. We have funds to pay for travel and lodging for six participants.

How · Apply by e-mail to christopher.blackwell@furman.edu by January 31, 2013.

For more information, see http://folio.furman.edu/workshop.html or contact christopher.blackwell@furman.edu

“Europeana’s Huge Dataset Opens for Re-use”

Friday, September 14th, 2012

According to this press-release from Europeana Professional, the massive European Union-funded project has released 20 million records on cultural heritage items under a Creative Commons Zero (Public Domain) license.

The massive dataset is the descriptive information about Europe’s digitised treasures. For the first time, the metadata is released under the Creative Commons CC0 Public Domain Dedication, meaning that anyone can use the data for any purpose – creative, educational, commercial – with no restrictions. This release, which is by far the largest one-time dedication of cultural data to the public domain using CC0 offers a new boost to the digital economy, providing electronic entrepreneurs with opportunities to create innovative apps and games for tablets and smartphones and to create new web services and portals.

Upon registering for access to the Europeana API, developers can build tools or interfaces on this data, download metadata into new platforms for novel purposes, make money off it, perform new research, create artistic works, or anything.

It’s important to note that it’s only the metadata that is being freely released here: I did a search for some Greek inscriptions, and also photographs and transcriptions are available, these are all fiercely copyrighted to the Greek Ministry of Culture: ” As for all monuments of cultural heritage, permission from the Greek Ministry of Culture is required for the reproduction of photographs of the inscriptions.” (According to this same license statement, even the metadata are licensed: “Copyright for all data in the collection belongs to the Institute for Greek and Roman Antiquity of the National Hellenic Research Foundation. These data may be used freely, provided that there is explicit reference to their provenance. ” Which seems slightly at odds with the CC0 claim of the Europeana site; no doubt closer examination of the legal terms would reveal which claim supercedes in this case.)

(It was lovely to be reminded that inscriptions provided by the Pandektis project [like this funerary monument for Neikagoras] have text made available in EpiDoc XML as well as Leiden-formatted edition.)

It would be really good to hear about any implementations, tools or demos built on top of this data, especially if that had a classics component. Any pointers yet? Or do we need to organize a hackfest to get it started….?

Editing Athenaeus Hackathon: Berlin/Leipzig, October 10-12

Saturday, September 1st, 2012

The Banquet of the Digital Scholars

Humanities Hackathon on editing Athenaeus and on the Reinvention of the Edition in a Digital Space

October 10-12, 2012 Universität Leipzig (ULEI) <http://www.zv.uni-leipzig.de/> & Deutsches Archäologisches Institut (DAI) Berlin <http://www.dainst.org/de/department/zentrale?ft=all>

Co-directors: Monica Berti – Marco Büchler – Gregory Crane – Bridget Almas

The University of Leipzig will host a hackathon that addresses two basic tasks. On the one hand, we will focus upon the challenges of creating a digital edition for the Greek author Athenaeus, whose work cites more than a thousand earlier sources and is one of the major sources for lost works of Greek poetry and prose. At the same time, we use the case Athenaeus to develop our understanding of to organize a truly born-digital edition, one that not only includes machine actionable citations and variant readings but also collations of multiple print editions, metrical analyses, named entity identification, linguistic features such as morphology, syntax, word sense, and co-reference analysis, and alignment between the Greek original and one or more later translations. (more…)

Official Release of the Virtual Research Environment TextGrid

Friday, April 27th, 2012

TextGrid (http://www.textgrid.de) is a platform for scholars in the humanities, which makes possible the collaborative analysis, evaluation and publication of cultural remains (literary sources, images and codices) in a standardized way. The central idea was to bring together instruments for the dealing with texts under a common user interface. The workbench offers a range of tools and services for scholarly editing and linguistic research, which are extensible by open interfaces, such as editors for the linkage between texts or between text sequences and images, tools for musical score edition, for gloss editing, for automatic collation etc.

On the occasion of the official release of TextGrid 2.0 a summit will take place from the 14th to the 15th of May 2012. On the 14th the summit will start with a workshop day on which the participants can get an insight into some of the new tools. For the following day lectures and a discussion group are planned.

For more information and registration see this German website:

http://www.textgrid.de/summit2012

With kind regards

Celia Krause


Celia Krause
Technische Universität Darmstadt
Institut für Sprach- und Literaturwissenschaft
Hochschulstrasse 1
64289 Darmstadt
Tel.: 06151-165555

TILE 1.0 released

Friday, July 22nd, 2011

Those who have been waiting impatiently for the first stable release of the Text Image Linking Environment (TILE) toolkit need wait no longer: the full program can be downloaded from: <http://mith.umd.edu/tile/>. From that site:

The Text-Image Linking Environment (TILE) is a web-based tool for creating and editing image-based electronic editions and digital archives of humanities texts.

TILE features tools for importing and exporting transcript lines and images of text, an image markup tool, a semi-automated line recognizer that tags regions of text within an image, and plugin architecture to extend the functionality of the software.

I haven’t tried TILE out for myself yet, but I’m looking forward to doing so.

Digital Papyrology

Tuesday, October 26th, 2010

The following is a lightly edited version of a talk that I delivered at the 26th Congress of the International Association of Papyrologists, 19 August 2010, in Geneva (program), posted here upon nudging of G. Bodard.

Colleagues. It is a great honor and a privilege to be able to speak with you today. An honor and a privilege that, I hasten to add, I did not seek, but which a number of our colleagues insisted some months back the members of this research team must try to live up to. If I approach this distinguished body with some trepidation it is perhaps because my training as an epigraphist has conditioned me to a tone less attuned to collegiality than that which informs the papyrologists’ discipline. I should add also that am here not to present my own work, but the fruits of a team whose members are in Heidelberg, London, New York, North Carolina, Alabama, and Kentucky, and who have been working heroically for more than three years now.

I shall aim to speak for no more than 40 minutes so that we may at least start discussions, which I know the rest of the team and I will be more than happy to carry on via email, Skype, phone, and separate face to face meetings. I will add also that, since the matters arising from this talk are highly technical in nature, we shall be more than happy to field questions as a team (I and my colleagues Rodney Ast, James Cowey, Tom Elliott, and Paul Heilporn) and in any of the languages within our competence.

First some background. I don’t need to tell you very much about the history of the Duke Data Bank of Documentary Papyri. It was founded in 1983, as a collaboration between William Willis and John Oates of Duke University, and the Packard Humanities Institute. A decade and a half later, around the time, as it happens, that APIS was also starting, the DDbDP decided to migrate from the old CD platform and to the web. John in particular was committed to making the data available for free, to anyone who wanted access. The Perseus Project, from Tufts University, very kindly agreed to host the new online DDbDP, to develop a search interface, to convert the data from old Beta code to a markup language called SGML–all at no cost to us. The DDbDP added a few thousand texts after switching from the Packard CD ROM to Perseus. But the landscape changed dramatically from this point onward, and the DDbDP began to fall behind. The end of the CD ROM meant the end of regular revenues to support data entry and proofreading. And of course, ongoing development of the search interface was not without cost to Perseus, whose generous efforts on our behalf were, as I mention, unremunerated. Within a few years the DDbDP was behind in data entry and the search interface was not able to grow and mature in the ways that papyrologists wanted.

(more…)

Citation in Digital Scholarship: A Conversation

Monday, October 4th, 2010

I’m writing to bring readers’ attention to a series of pages that is coming together on the Digital Classicist wiki under the rubric “Citation in digital scholarship” (category). I take responsibility/blame for initiating the project, but it has already benefitted from input by Matteo Romanello (author of CRefEx) and from comments by my colleagues at NYU’s Institute for the Study of the Ancient World. You’ll also see the influence of the Canonical Text Services.

A slight preview of what you’ll find there and of where this all might go:

  1. The goal is to provide a robust and simple convention for indicating that citations are present. How robust? How simple? At a bare minimum, just wrap a citation in ‘<a class=”citation” href=”http://example.org”>…</a>’. That will distinguish intellectually significant citations from other links (such as to a home page for the hosting project). I cribbed the  ’class=”citation”‘ string from Matteo’s articles cited at the bottom of the wiki page. Please also consider adding a ‘title’ and ‘lang’ attribute as described.
  2. We are also interested in encouraging convergence on best practices for communicating information about the entities being cited and about the nature of the citation itself:
    1. There is a page “Citations with added RDFa” that suggests conventions for using RDFa to add markup. It encourages use of Dublin Core terms.
    2. Matteo has begun a page “Citations with CTS and Microformats“. CTS, developed by Neel Smith and Chris Blackwell, is important by way of its potential to provide stable URIs to well-known texts.

    Merging these conventions is of ongoing interest. And they do illustrate that one goal is to converge on best practices that are extendable and not in unnecessary conflict with existing work.

  3. While it isn’t represented on the wiki yet, I intend to start a javascript library that will identify citations in a page (e.g. jQuery’s “$(‘.citation’)” ) in order to present information about, along with options for following, a particular citation. Or to list and map all the dc:Location’s cited in a text. Etc.
  4. Closing the loop: this work overlaps with a meeting held by the ISAW Digital Projects Team in NYC last week. The preliminary result is a tool for managing URIs in a shared bibliographic infrastructure. This is one example of an entity that can produce embeddable markup conforming to the ‘class=”citation”‘ convention. Such markup would be consumable by the planned js library. Any project that produces stable URIs can have an “Embed a link to here.” (vel sim) widget that produces conforming html for authors to re-use.

I’m grateful to Gabriel Bodard for letting me use the Digital Classicist wiki to start these pages and for encouraging me to summarize here. The effort is inspired by the observation that a little bit of common documentation, sharing, and tool building can lead to big wins for users and developers, as well as to greater interoperability for our citation practices going forward.

Comments here are very welcome.

Digital Classics Bibliography

Monday, September 6th, 2010

As part of my PhD in Digital Humanities I’m working on a literature review of  publications related to the theme “Classics and Computers”.

I’m looking specifically at general surveys, studies and discussions about the history of the relationship between classics and computers, a disciplinary field that has recently emerged as Digital Classics.

However, as Tom Elliott pointed me out Alison Babeu (Perseus project) has recently published on CiteULike a much broader bibliography as “as part of an IMLS-funded planning project that’s looking a digital infrastructure needs for Classics (Perseus and CLIR are the co-recipients of the grant)”.

For the time being, in order to allow anyone with any interest in this to contribute I created a group on Zotero called digitalclassics. The group is open (i.e. my authorisation is not needed to join) so please join it and start contributing your entries to the list. I’m thinking in particular of publications that I have unintentionally neglected and/or publications in other languages that I was not aware of.

Currently, the entries in the Zotero Library are divided into two main categories: general studies and applications, where the latter is meant to host publications concerning specific applications Digital Classics-related. More subcategories may be added as long as we go further or members of the list can even add new ones by themselves.As soon as the bibliography will reach a reasonably stable shape I will update the page I have already created on the DigitalClassicist wiki.

I want to thank in advance the DigitalClassicist community for the support they have shown me on the list and for the entries they have started already contributing.

Give a Humanist a Supercomputer…

Tuesday, December 22nd, 2009

The “Wired Campus” section of the Chronicle of Higher Education is reporting on the uses that humanities scholars have found for the U.S. Department of Energy’s High Performance Computing resources.  The short article reports on the efforts of several people who have made use of the resources, including Gregory Crane of the Perseus Project, David Bamman, a computational linguist who has been mining data from classical texts, and David Koller, a researcher with the Digital Sculpture Project, which has developed ways to coalesce numerous images of an object into a high-resolution 3D image.  The article reports that, according to Mr. Koller, intermediaries are needed who can help humanities and computer researchers communicate with each other.

Ruins of Pompeii now in Google Street View

Friday, December 4th, 2009

The title says it all.  Check it out here.

Special issue of the DHQ in honour of Ross Scaife

Friday, February 27th, 2009

copied from Humanist:

From: Julia Flanders
Subject: DHQ issue 3.1 now available
We’re very happy to announce the publication of the new issue of DHQ:

DHQ 3.1 (Winter 2009)
A special issue in honor of Ross Scaife: “Changing the Center of
Gravity: Transforming Classical Studies Through Cyberinfrastructure”
Guest editors: Melissa Terras and Gregory Crane
http://www.digitalhumanities.org/dhq/

Table of Contents

Acknowledgements and Dedications
Gregory Crane, Tufts University; Brent Seales, University of
Kentucky; Melissa Terras, University College London

Ross Scaife (1960-2008)
Dot Porter, Digital Humanities Observatory

Cyberinfrastructure for Classical Philology
Gregory Crane, Tufts University; Brent Seales, University of
Kentucky; Melissa Terras, University College London

Technology, Collaboration, and Undergraduate Research
Christopher Blackwell, Furman University; Thomas R. Martin, College
of the Holy Cross

Tachypaedia Byzantina: The Suda On Line as Collaborative Encyclopedia
Anne Mahoney, Tufts University

Exploring Historical RDF with Heml
Bruce Robertson, Mount Allison University

Digitizing Latin Incunabula: Challenges, Methods, and Possibilities
Jeffrey A. Rydberg-Cox, University of Missouri-Kansas City

Citation in Classical Studies
Neel Smith, College of the Holy Cross

Digital Criticism: Editorial Standards for the Homer Multitext
Casey Dué, University of Houston, Texas; Mary Ebbott, College of the
Holy Cross

Epigraphy in 2017
Hugh Cayless, University of North Carolina; Charlotte Roueché, King’s
College London; Tom Elliott, New York University; Gabriel Bodard,
King’s College London

Digital Geography and Classics
Tom Elliott, New York University; Sean Gillies, New York University

What Your Teacher Told You is True: Latin Verbs Have Four Principal
Parts
Raphael Finkel, University of Kentucky; Gregory Stump, University of
Kentucky

Computational Linguistics and Classical Lexicography
Gregory Crane, Tufts University; David Bamman, Tufts University

Classics in the Million Book Library
Gregory Crane, Tufts University; Alison Babeu, Tufts University;
David Bamman, Tufts University; Thomas Breuel, Technical University of
Kaiserslautern; Lisa Cerrato, Tufts University; Daniel Deckers,
Hamburg University; Anke Lüdeling, Humboldt-University, Berlin; David
Mimno, University of Massachusetts, Amherst; Rashmi Singhal, Tufts
University; David A. Smith, University of Massachusetts, Amherst; Amir
Zeldes, Humboldt-University, Berlin

Conclusion: Cyberinfrastructure, the Scaife Digital Library and
Classics in a Digital age
Christopher Blackwell, Furman University; Gregory Crane, Tufts
University

Best wishes from the DHQ editorial team

Archaeological and Epigraphic interchange and e-Science

Thursday, January 29th, 2009

Workshop at the e-Science Institute, Edinburgh, February 10-11, 2009 (see programme and registration):

Rationale: The meeting will bring technical and editorial researchers participating in, or otherwise engaged with, the IOSPE (Inscriptiones Orae Septentrionalis Ponti Euxini = Ancient Inscriptions of the Northern Black Sea Coast.) project together with researchers in related fields, both historical and computational. Existing projects, such as the Inscriptions of Roman Cyrenaica and Inscriptions of Aphrodisias, have explored the digitization of ancient inscriptions from their regions, and employed the EpiDoc schema as markup. IOSPE plans to expand this sphere of activity, in conjunction with an multi-volume publication of inscription data. This event is a joint workshop funded in part by a Small Research Grant from the British Academy, and in part by the eSI through the Arts and Humanities e-Science theme. The workshop will bring together domain experts in epigraphy, and specialists in digital humanities, and e-science researchers, which will provide a detailed scoping of the research questions, and the research methods needed to investigate them from an historical/epigraphic point of view.

The success of previous projects, and the opportunities identified by the IOSPE research team, raise questions of significant interest for the e-science community. Great interpretive value can be attached to datasets such as these if they are linked, both with each other, and with other relevant datasets. The LaQuaT project at King’s, part of ENGAGE, is addressing this. There is also an important adjunct research area in the field of digital geographic analysis of these datasets: again, this can only be achieved if disparate data collections can be meaningfully cross-walked.

Contribute to the Greek and Latin Treebanks at Perseus!

Thursday, August 28th, 2008

Posted on behalf of Greg Crane. Link to the Treebank, which provides more information, at the very end of the post.

We are currently looking for advanced students of Greek and Latin to contribute syntactic analyses (via a web-based system) to our existing Latin Treebank (described below) and our emerging Greek Treebank as well (for which we have just received funding). We particularly encourage students at various levels to design research projects around this new tool. We are looking in particular for the following:

  • Get paid to read Greek! We can have a limited number of research assistantships for advanced students of the languages who can work for the project from their home institutions. We particularly encourage students who can use the analyses that they produce to support research projects of their own.
  • We also encourage classes of Greek and Latin to contribute as well. Creating the syntactic analyses provides a new way to address the traditional task of parsing Greek and Latin. Your class work can then contribute to a foundational new resource for the study of Greek and Latin – both courses as a whole and individual contributors are acknowledged in the published data.
  • Students and faculty interested in conducting their own original research based on treebank data will have the option to submit their work for editorial review to have it published as part of the emerging Scaife Digital Library.

To contribute, please contact David Bamman (david.bamman@tufts.edu) or Gregory Crane (gregory.crane@tufts.edu).

http://nlp.perseus.tufts.edu/syntax/treebank/

Registration: 3D Scanning Conference at UCL

Tuesday, February 26th, 2008

Kalliopi Vacharopoulou wrote, via the DigitalClassicist list:

I would like to draw to your attention the fact that registration for the 3D Colour Laser Scanning Conference at UCL on the 27th and 28th of March has now opened.

The first day (27th of March) will include a keynote presentation and papers on the themes of General Applications of 3D Scanning in the Museum and Heritage Sector and of 3D Scanning in Conservation.

The second day (28th of March) will offer a keynote presentation and papers on the themes of 3D Scanning in Display (and Exhibition) and Education and Interpretation. A detailed programme with the papers and the names of the speakers can be found in our website.

If you would like to attend the conference, I would kindly request to fill in the registration form which you can find in this link and return it to me as soon as possible.

There is no fee for participating (or attending the conference) (coffee and lunch are provided free of charge). Please note that attendance is offered on a first-come, first-served basis.

Please feel free to circulate the information about the conference to anyone who you think might be interested.

In the meantime, do not hesitate to contact me with any inquiries.

Search Pigeon

Monday, February 18th, 2008

Spotted by way of Peter Suber’s Open Access News:

Search Pigeon is a collection of Google Co-opTM Custom Search Engines (CSEs) designed to make researching on the web a richer, more rewarding, and more efficient process.

Designed for researchers in the Arts and Humanities, with a decidedly interdisciplinary bent, the objective of Search Pigeon is to provide a tool enabling the productive and trustworthy garnering of scholarly articles through customized searching.

Right now SearchPigeon.org provides CSEs that search hundreds of peer-reviewed and open access online journals, provided they are either English-language journals, or provide a translation of their site into English.

Digital Humanities Tool Survey

Thursday, January 17th, 2008

In Brett Bobley’s recent email, he alerted us to Susan Schreibman’s survey:

Colleagues,

Over the past few years, the idea of tool development as a scholarly activity in the digital humanities has been gaining ground. It has been the subject of numerous articles and conference presentations. There has not been, however, a concerted effort to gather information about the perceived value of tool development, not only as a scholarly activity, but in relation to the tenure and promotion process, as well as for the advancement of the field itself.

Ann Hanlon and myself have compiled such a survey and would be grateful if those of you who are or have been engaged in tool development for the digital humanities would take the time to complete an online Digital Humanities Tool Developers’ Survey.

You will need to fill up a consent form before you begin, and there is an opportunity to provide us with feedback on more than one tool (you simply take the survey again). The survey should not take more than 10-15 minutes. It is our intention to present the results of our survey at Digital Humanities 2008.

With all best wishes,

Susan Schreibman
Assistant Dean
Head of Digital Collections and Research McKeldin Library University of
Maryland College Park

Long-term data preservation

Friday, December 14th, 2007

There was an article in New Scientist last week on plans for permanent data preservation for the scientific data. The argument in the sciences seems to be that all data should be preserved, as some of it will be from experiments that are unrepeatable (in particular Earth observations, astronomy, particle accelerators, and other highly expensive projects that can produce petabytes of data). It is a common observation that any problems we have in the humanities, the sciences have in spades and will solve for us, but what is interesting here is that the big funding being thrown at this problem by the likes of the NSF, ESF, and the Alliance for Permanent Access is considered news. This is a recognised problem, and the sciences don’t have the solution yet… Grid and Supercomputing technologies are still developing.

(Interestingly, I have heard the argument made in the humanities that on the contrary, most data is a waste of space and should be thrown away because it will just make it more difficult for future researchers to find the important stuff among all the crap. Even in the context of archaeology, where one would have thought practitioners would be sensitive to the fragile nature of the materials and artefacts that we study, there is a school of thought that says our data–outside of actual publications–are just not important enough to preserve in the long term. Surely in the Googleverse finding what you want in a vast quantity of information is a problem with better solutions than throwing out stuff that you don’t think important and therefore cannot imagine anyone else finding interesting.)

Another important aspect of the preservation article is the observation that:

Even if the raw data survives, it is useless without the background information that gives it meaning.

We have made this argument often in Digital Humanities venues: raw data is not enough, we also need the software, the processing instructions, the script, presentation, search, and/or transformation scenarios that make this data meaningful for our interpretations and publications. This is in technical terms the equivalent of documenting experimental methodology to make sure that research results can be replicated, but it also as essential and providing binary data and documenting the format so that this data can be interpreted as structured text (say).

It’s good to see that this is a documented issue and that large resources are being thrown at it. We shall watch their progress with great interest.

Kindle

Monday, November 19th, 2007

Can Amazon succeed where others have found little traction so far?  I already spend so much time reading freely available materials (journal articles, blogs, magazines, reviews) with my trusty Macbook Pro that I feel no need at all for a special-purpose e-text reader.

Perseus code goes Open Source!

Tuesday, November 13th, 2007

From Greg Crane comes the much-anticipated word that all of the hopper code and much of the content in Perseus is now officially open sourced:

November 9, 2007: o *Install Perseus 4.0 on your computer*:

All of the source code for the Perseus Java Hopper and much of the content in Perseus is now available under an open source license. You can download the code, compile it, and run it on your own system. This requires more labor and a certain level of expertise for which we can only provide minimal support. However, since it will be running on your own machine, it can be much faster than our website, especially during peak usage times. You also have the option to install only certain collections or texts on your version, making it as specialized as you wish. Also, if you want to use a different system to make the content available, you can do so within the terms of the Creative Commons http://creativecommons.org/licences/by-nc-sa/3.0/us license. This is the first step in open sourcing the code: you can modify the code as much as you want, but at this time, we cannot integrate your changes back into our system. That is our ultimate goal, so keep a look out for that!

Download source code here
http://sourceforge.net/projects/perseus-hopper

Download text data here
http://www.perseus.tufts.edu/%7Ersingh04/

Slowly, slowly

Saturday, October 27th, 2007

It’s a shame the JPEG 2000 bandwagon has been creeping along at such a slow pace, but this seems like good news from the LOC.

In-house machine translation at Google

Tuesday, October 23rd, 2007

from Google Blogoscoped:

Google switched to a new translation system for the remaining language pairs on the Google Translator which were so far provided by Systran. The translator help files don’t mention this yet, but it might be possible that the new translations are the results of Google’s in-house machine translation efforts.

In a quick comparison between Systran translation and Google translation of the English < --> German language pair, I couldn’t see a clear winner yet (though I get the feeling Google’s results are slightly superior), but a lot of garbage results on both ends. Translating a sample blog post into German, for instance, was so bad that you’d have a hard time making any sense out of what was written if you don’t speak English. While it might help to get the point across for some texts, you start to wonder if these kind of translations in the wild will cause more understanding in the world, or more misunderstanding.