Archive for the ‘Tools’ Category

Archaeological and Epigraphic interchange and e-Science

Thursday, January 29th, 2009

Workshop at the e-Science Institute, Edinburgh, February 10-11, 2009 (see programme and registration):

Rationale: The meeting will bring technical and editorial researchers participating in, or otherwise engaged with, the IOSPE (Inscriptiones Orae Septentrionalis Ponti Euxini = Ancient Inscriptions of the Northern Black Sea Coast.) project together with researchers in related fields, both historical and computational. Existing projects, such as the Inscriptions of Roman Cyrenaica and Inscriptions of Aphrodisias, have explored the digitization of ancient inscriptions from their regions, and employed the EpiDoc schema as markup. IOSPE plans to expand this sphere of activity, in conjunction with an multi-volume publication of inscription data. This event is a joint workshop funded in part by a Small Research Grant from the British Academy, and in part by the eSI through the Arts and Humanities e-Science theme. The workshop will bring together domain experts in epigraphy, and specialists in digital humanities, and e-science researchers, which will provide a detailed scoping of the research questions, and the research methods needed to investigate them from an historical/epigraphic point of view.

The success of previous projects, and the opportunities identified by the IOSPE research team, raise questions of significant interest for the e-science community. Great interpretive value can be attached to datasets such as these if they are linked, both with each other, and with other relevant datasets. The LaQuaT project at King’s, part of ENGAGE, is addressing this. There is also an important adjunct research area in the field of digital geographic analysis of these datasets: again, this can only be achieved if disparate data collections can be meaningfully cross-walked.

Contribute to the Greek and Latin Treebanks at Perseus!

Thursday, August 28th, 2008

Posted on behalf of Greg Crane. Link to the Treebank, which provides more information, at the very end of the post.

We are currently looking for advanced students of Greek and Latin to contribute syntactic analyses (via a web-based system) to our existing Latin Treebank (described below) and our emerging Greek Treebank as well (for which we have just received funding). We particularly encourage students at various levels to design research projects around this new tool. We are looking in particular for the following:

  • Get paid to read Greek! We can have a limited number of research assistantships for advanced students of the languages who can work for the project from their home institutions. We particularly encourage students who can use the analyses that they produce to support research projects of their own.
  • We also encourage classes of Greek and Latin to contribute as well. Creating the syntactic analyses provides a new way to address the traditional task of parsing Greek and Latin. Your class work can then contribute to a foundational new resource for the study of Greek and Latin – both courses as a whole and individual contributors are acknowledged in the published data.
  • Students and faculty interested in conducting their own original research based on treebank data will have the option to submit their work for editorial review to have it published as part of the emerging Scaife Digital Library.

To contribute, please contact David Bamman ( or Gregory Crane (

Registration: 3D Scanning Conference at UCL

Tuesday, February 26th, 2008

Kalliopi Vacharopoulou wrote, via the DigitalClassicist list:

I would like to draw to your attention the fact that registration for the 3D Colour Laser Scanning Conference at UCL on the 27th and 28th of March has now opened.

The first day (27th of March) will include a keynote presentation and papers on the themes of General Applications of 3D Scanning in the Museum and Heritage Sector and of 3D Scanning in Conservation.

The second day (28th of March) will offer a keynote presentation and papers on the themes of 3D Scanning in Display (and Exhibition) and Education and Interpretation. A detailed programme with the papers and the names of the speakers can be found in our website.

If you would like to attend the conference, I would kindly request to fill in the registration form which you can find in this link and return it to me as soon as possible.

There is no fee for participating (or attending the conference) (coffee and lunch are provided free of charge). Please note that attendance is offered on a first-come, first-served basis.

Please feel free to circulate the information about the conference to anyone who you think might be interested.

In the meantime, do not hesitate to contact me with any inquiries.

Search Pigeon

Monday, February 18th, 2008

Spotted by way of Peter Suber’s Open Access News:

Search Pigeon is a collection of Google Co-opTM Custom Search Engines (CSEs) designed to make researching on the web a richer, more rewarding, and more efficient process.

Designed for researchers in the Arts and Humanities, with a decidedly interdisciplinary bent, the objective of Search Pigeon is to provide a tool enabling the productive and trustworthy garnering of scholarly articles through customized searching.

Right now provides CSEs that search hundreds of peer-reviewed and open access online journals, provided they are either English-language journals, or provide a translation of their site into English.

Digital Humanities Tool Survey

Thursday, January 17th, 2008

In Brett Bobley’s recent email, he alerted us to Susan Schreibman’s survey:


Over the past few years, the idea of tool development as a scholarly activity in the digital humanities has been gaining ground. It has been the subject of numerous articles and conference presentations. There has not been, however, a concerted effort to gather information about the perceived value of tool development, not only as a scholarly activity, but in relation to the tenure and promotion process, as well as for the advancement of the field itself.

Ann Hanlon and myself have compiled such a survey and would be grateful if those of you who are or have been engaged in tool development for the digital humanities would take the time to complete an online Digital Humanities Tool Developers’ Survey.

You will need to fill up a consent form before you begin, and there is an opportunity to provide us with feedback on more than one tool (you simply take the survey again). The survey should not take more than 10-15 minutes. It is our intention to present the results of our survey at Digital Humanities 2008.

With all best wishes,

Susan Schreibman
Assistant Dean
Head of Digital Collections and Research McKeldin Library University of
Maryland College Park

Long-term data preservation

Friday, December 14th, 2007

There was an article in New Scientist last week on plans for permanent data preservation for the scientific data. The argument in the sciences seems to be that all data should be preserved, as some of it will be from experiments that are unrepeatable (in particular Earth observations, astronomy, particle accelerators, and other highly expensive projects that can produce petabytes of data). It is a common observation that any problems we have in the humanities, the sciences have in spades and will solve for us, but what is interesting here is that the big funding being thrown at this problem by the likes of the NSF, ESF, and the Alliance for Permanent Access is considered news. This is a recognised problem, and the sciences don’t have the solution yet… Grid and Supercomputing technologies are still developing.

(Interestingly, I have heard the argument made in the humanities that on the contrary, most data is a waste of space and should be thrown away because it will just make it more difficult for future researchers to find the important stuff among all the crap. Even in the context of archaeology, where one would have thought practitioners would be sensitive to the fragile nature of the materials and artefacts that we study, there is a school of thought that says our data–outside of actual publications–are just not important enough to preserve in the long term. Surely in the Googleverse finding what you want in a vast quantity of information is a problem with better solutions than throwing out stuff that you don’t think important and therefore cannot imagine anyone else finding interesting.)

Another important aspect of the preservation article is the observation that:

Even if the raw data survives, it is useless without the background information that gives it meaning.

We have made this argument often in Digital Humanities venues: raw data is not enough, we also need the software, the processing instructions, the script, presentation, search, and/or transformation scenarios that make this data meaningful for our interpretations and publications. This is in technical terms the equivalent of documenting experimental methodology to make sure that research results can be replicated, but it also as essential and providing binary data and documenting the format so that this data can be interpreted as structured text (say).

It’s good to see that this is a documented issue and that large resources are being thrown at it. We shall watch their progress with great interest.


Monday, November 19th, 2007

Can Amazon succeed where others have found little traction so far?  I already spend so much time reading freely available materials (journal articles, blogs, magazines, reviews) with my trusty Macbook Pro that I feel no need at all for a special-purpose e-text reader.

Perseus code goes Open Source!

Tuesday, November 13th, 2007

From Greg Crane comes the much-anticipated word that all of the hopper code and much of the content in Perseus is now officially open sourced:

November 9, 2007: o *Install Perseus 4.0 on your computer*:

All of the source code for the Perseus Java Hopper and much of the content in Perseus is now available under an open source license. You can download the code, compile it, and run it on your own system. This requires more labor and a certain level of expertise for which we can only provide minimal support. However, since it will be running on your own machine, it can be much faster than our website, especially during peak usage times. You also have the option to install only certain collections or texts on your version, making it as specialized as you wish. Also, if you want to use a different system to make the content available, you can do so within the terms of the Creative Commons license. This is the first step in open sourcing the code: you can modify the code as much as you want, but at this time, we cannot integrate your changes back into our system. That is our ultimate goal, so keep a look out for that!

Download source code here

Download text data here

Slowly, slowly

Saturday, October 27th, 2007

It’s a shame the JPEG 2000 bandwagon has been creeping along at such a slow pace, but this seems like good news from the LOC.

In-house machine translation at Google

Tuesday, October 23rd, 2007

from Google Blogoscoped:

Google switched to a new translation system for the remaining language pairs on the Google Translator which were so far provided by Systran. The translator help files don’t mention this yet, but it might be possible that the new translations are the results of Google’s in-house machine translation efforts.

In a quick comparison between Systran translation and Google translation of the English < --> German language pair, I couldn’t see a clear winner yet (though I get the feeling Google’s results are slightly superior), but a lot of garbage results on both ends. Translating a sample blog post into German, for instance, was so bad that you’d have a hard time making any sense out of what was written if you don’t speak English. While it might help to get the point across for some texts, you start to wonder if these kind of translations in the wild will cause more understanding in the world, or more misunderstanding.

Multiverse & Sketchup: Doom of Second Life?

Tuesday, October 16th, 2007

from Shawn Graham’s Electric Archaeology:

From an archaeological point of view, creating 3d representations of a site using Sketchup, and then moving that with the terrain into an online world, with the associated annotations etc could really be revolutionary – what immediately springs to mind is that this would make a far better way of publishing a site than a traditional monograph. Internet Archaeology (the journal) has been trying for just that kind of thing for a while. Maybe IA should host a world in Multiverse…?

Fitzpatrick on CommentPress

Monday, October 15th, 2007

from Kathleen Fitzpatrick, “CommentPress: New (Social) Structures for New (Networked) Texts,” Journal of Electronic Publishing, Fall 2007:

… CommentPress demonstrates the fruitfulness of reimagining the technologies of electronic publishing in service to the social interconnections of authors and readers. The success of the electronic publishing ventures of the future will likely hinge on the liveliness of the conversations and interactions that they can produce, and the further new writing that those interactions can inspire. CommentPress grows out of an understanding that the chief problem involved in creating the future of the book is not simply placing the words on the screen, but structuring their delivery in an engaging manner; the issue of engagement, moreover, is not simply about locating the text within the technological network, but also, and primarily, about locating it within the social network. These are the problems that developers must focus on in seeking the electronic form that can not just rival but outdo the codex, as a form that invites the reader in, that acknowledges that the reader wants to respond, and that understands all publication as part of an ongoing series of public conversations, conducted in multiple time registers, across multiple texts. Making those conversations as accessible and inviting as possible should be the goal in imagining the textual communications circuit of the future.


Wednesday, October 10th, 2007

posted to the TEI list

Wiki2Tei converter 1.0

We are pleased to announce the first release of the Wiki2Tei software. Wiki2Tei is a converter from the mediawiki format to XML (TEI vocabulary).

The mediawiki format is used by wikimedia fundation wikis (Wikipedia, Wikibooks, Wikisource), and many other wikis using the mediawiki software. Large amounts of free hight-quality structured texts are available in this format. These texts are used more and more often in NLP (natural language processing) projects. However, the mediawiki parser is oriented towards rendition and the mediawiki syntax is complex and hard to parse.

The Wiki2Tei converter makes available the information contained in wiki syntax (structuration, highlighting, etc.), and allows to properly retrieve the plain text. This conversion is intended to preserve all the properties of the original text. Wiki2Tei is closely coupled with the mediawiki software, allowing to convert all the features of the mediawiki syntax.

The Wiki2Tei converter provides a rich set of tools for converting mediawiki text from several sources (file, mediawiki database) and managing collections of files to be converted. The TEI vocabulary used is documented, according to the TEI Guidelines, in an ODD document. The code is open source and may be downloaded from the SourceForge download area:

The web site contains full documentation and a “demo”:

A mailing list is open:

Bernard Desgraupes,
Sylvain Loiseau

Version control, visualized

Friday, September 28th, 2007

A nice visual overview of the purposes and mechanisms for version control, from Better Explained.

UI for Google book search improves

Wednesday, September 26th, 2007

Inside Google Book Search offers an update of “New ways to dig into Book Search.”

Cuneiform Digital Library Initiative and Digital Library Program of UCLA

Wednesday, September 26th, 2007

The Cuneiform Digital Library Initiative and the Digital Library Program of the University of California, Los Angeles, are pleased to announce their successful proposal to the Institute for Museum and Library Services program “National Leadership Grants: Building Digital Resources” for funding of a two-year project dedicated to improving data management and archiving tools in Humanities research.

Project Title: “Cuneiform Digital Library Initiative: Second Generation”

The UCLA University Library and UCLA’s Department of Near Eastern Languages and Cultures will create the Cuneiform Digital Library Initiative: Second Generation (CDLI 2). The project will migrate 450,000 legacy archival and access images and metadata from CDLI to UCLA’s Digital Library Content System, standardizing and upgrading the metadata to improve discovery and enable content archiving within the California Digital Library’s Digital Preservation Repository. The project will add 7,000 digital artifacts with cuneiform inscriptions, including collections housed at the University of Chicago’s Oriental Institute and in Syrian national museums. This project will ensure the long-term preservation of text inscribed on endangered ancient cuneiform tablets. (see the IMLS notice of grants in this cycle)

Principal Investigators:

Stephen Davison
Robert K. Englund


Thursday, September 20th, 2007

Hearing Mojo is not happy:

I can’t believe Apple failed to make its iPhone compatible with either hearing aids or cochlear implants. I’m in the market for a mobile phone again and just discovered the lack of compatibility. Given all the hype surrounding the iPhone launch, I’m surprised there haven’t been more complaints, other than the strong objection I just found on Paula Rosenthal’s HearingExchange site, some chatter on Apple forums, and a complaint made to the FCC by the Hearing Loss Association of America. HLAA has done the most advocacy for hearing-aid compatibility (HAC) regulations, which now mandate 50 percent of manufacturers’ handsets meet minimum M3 compatibility standards. The M3 and M4 ratings mean there’s no buzzing when you listen to the phone with your hearing-aid microphone on, and T3 and T4 ratings mean the phone works with the telecoils in your hearing aids. But according to the HLAA complaint: “Apple has now entered the scene and is predicted to shake up the entire wireless industry. Yet they are not, nor have ever been, involved in any discussions regarding HAC requirements.” Steve Jobs is known for his arrogance and inflexibility when it comes to the design of his products. Apple’s treatment of the hearing-impaired population is a great example. What a disappointment.

Diamond Synchrotron used to read ancient texts

Sunday, September 16th, 2007

(Spotted by Gregg Schwender and seen in a Diamond Lab press release.)

The ultra-powerful I22 Non-crystalline Diffraction beamline (as best as I understand it an application of the laser particle accellerator that produces highly concentrated pure light for scanning at nanoscopic resolutions) is being applied to the reading of damaged parchment and other ancient and at-risk documents. The synchrotron can analyse the condition of collagen in paper or vellum and determine the patterns of any potentially corrosive ink; this is particularly valuable in cases of very fragile texts, such as those partially eaten away by iron gall ink, or ancient dessicated manuscripts such as the Dead Sea Scrolls.

I first heard about this story–albeit in very vague terms–at a party last night, and I have to say that my first reaction was disbelief. I assumed that the speaker (neither a digital humanist nor a manuscript scholar) had misunderstood or misrepresented the story of a particle accellerator the size of four football pitches being used to read the Dead Sea Scrolls. Surely the expense involved would just never be spent on something as niche as manuscript studies? (Not to mention that I know excellent results are already being achieved using standard medical imaging technology.) I apologise to my nameless source for my lack of faith. I guess I need reminding occasionally that even people with big and expensive fish to fry can share our obsession with digital and humanistic concerns.


Friday, August 24th, 2007

Why I gave up on my university’s email years ago:

Along with the neat-o peripheral gizmos like messaging, calendars, and collaboration tools, the outsourced systems are more stable, have better spam filters, and provide much more storage space than the typical university’s in-house system.

Seemed like a no-brainer…  (Colleges Outsource E-mail to Big Players, U.S.News & World Report)

Keyboard shortcuts

Wednesday, August 22nd, 2007

From liquidicity, keyboard shortcuts for about every character key available on a Mac.

Who edits Wikipedia?

Saturday, August 18th, 2007

A very interesting site has been doing the rounds of news and blogs lately, which allows users to trace anonymous edits of Wikipedia articles by comparing to the public record of registered IP addresses. The WikiScanner is itself neutral as to the kind of searches one may carry out–it merely accesses and mashes-up information from two publicly available sources–but many of the most public implementations (such as those collected by Wired magazine) have been political, moral, or salacious. So, for example, users with an IP address registered to the office of a given religious organisation might be shown to have “anonymously” edited the Wikipedia entry on that religion, whitewashed crimes or scandals, or slandered rival groups or individuals of their own organisation. (All this by way of example only–actual instances you can look up for yourself.)

This is not only an interesting and imaginative example of a mashup, but also a potentially very useful control on one of the biggest threats to Wikipedia’s much-vaunted “neutral point of view”–namely the ability of wealthy corporations or individuals to hire lobbyists and PR agencies to clean up their profile on the web. More transparency means more accountability means more reliable information. Potentially. Effectively this tool removes the ability to edit completely anonymously, without raising the bar to entry in the Wiki community by requiring registration and identification.

I’ve yet to find any interesting academic examples of biased “anonymous” edits–and I guess they’d be hard to pin down because the range of IPs registered to a university would typically include lab workstations and other machines accessible by a large number of people. I’m sure something interesting will turn up, however. Keep looking.

Learn a foreign language on the cheap

Thursday, August 16th, 2007

from Lifehacker:

The No Thick Manuals wiki details how to learn a language efficiently using two free, open source applications. The first is jVLT (java Vocabulary Learning Tool), a completely cross platform flash card application. The second is StarDict, a Windows/Linux-only dictionary that provides an impressive array of features and dictionaries. Granted, most of us would require some textbooks and/or audio supplements, but anyone learning a language needs a good dictionary and some flash cards, and these free desktop applications sound a lot simpler than making flash cards by hand and manually looking up words in your dictionary.

Slidecast in Slideshare

Thursday, August 2nd, 2007

For the Neo-Latin Colloquia site Dot Porter created a couple of SMIL files to permit simultaneous playback of text and audio via Quicktime.  The new Slidecast feature of Slideshare appears to offer similar functionality with greater ease.

CommentPress 1.0 released

Wednesday, July 25th, 2007

CommentPress is a free theme for the WordPress blogging engine that allows readers to comment paragraph by paragraph in the margins of a text. Annotate, gloss, workshop, debate: with CommentPress you can do all of these things on a finer-grained level, turning a document into a conversation. It can be applied to a fixed document (paper/essay/book etc.) or to a running blog. CommentPress was developed by the Institute for the Future of the Book “to enable social interaction around long-form texts.” Some of the possibilities:

  • scholarly contexts: working papers, conferences, annotation projects, journals, collaborative glosses
  • educational: virtual classroom discussion around readings, study groups
  • journalism/public advocacy/networked democracy: social assessment and public dissection of government or corporate documents, cutting through opaque language and spin
  • creative writing: workshopping story drafts, collaborative storytelling
  • recreational: social reading, book clubs

Update: University Publishing In A Digital Age now set up for social annotation.

Forthcoming lectures on arts and humanities e-science

Wednesday, June 27th, 2007

Forwarded from AHESC Arts and Humanites e-Science Support Centre

The next lectures in the e-Science in the Arts and Humanities Theme (see begin next week. The Theme, organized by the Arts and Humanities e-Science Support Centre (AHeSSC) and hosted by the e-Science Institute in Edinburgh, aims to explore the new challenges for research in the Arts and Humanities
and to define the new research agenda that is made possible by e-Science technology.

The lectures are:

Monday 2 July: Grid Enabling Humanities Datasets

Friday 6 July: e-Science and Performance

Monday 23 July: Aspects of Space and Time in Humanities e-Science

In all cases it will be possible to view the lecture on webcast, and to ask questions or contribute to the debate, in real time via the blog feature. Please visit E-Science_in_the_Arts_and_Humanities, and follow the ‘Ask questions
during the lecture’ link for more information about the blog, and the ‘More details’ link for more information about the events themselves and the webcasts.

AHeSSC forms a critical part of the AHRC-JISC initiative on e-Science in Arts and Humanities research. The Centre is hosted by King’s College London and located at the Arts and Humanities Data Service (AHDS) and the AHRC Methods Network. AHeSSC exists to support, co-ordinate and promote e-Science in all arts and humanities disciplines, and to liaise with the e-Science and e-Social Science communities, computing, and information sciences.

Please contact Stuart Dunn (stuart.dunn[at] or Tobias Blanke
(tobias.blanke[at] at AHeSSC for more information.