Archive for the ‘Rights’ Category

OEDUc: Images and Image metadata working group

Tuesday, June 13th, 2017

Participants: Sarah Middle, Angie Lumezeanu, Simona Stoyanova


The Images and Image Metadata working group met at the London meeting of the Open Epigraphic Data Unconference on Friday, May 15, 2017, and discussed the issues of copyright, metadata formats, image extraction and licence transparency in the Epigraphik Fotothek Heidelberg, the database which contains images and metadata relating to nearly forty thousand Roman inscriptions from collections around the world. Were the EDH to lose its funding and the website its support, one of the biggest and most useful digital epigraphy projects will start disintegrating. While its data is available for download, its usability will be greatly compromised. Thus, this working group focused on issues pertaining to the EDH image collection. The materials we worked with are the JPG images as seen on the website, and the images metadata files which are available as XML and JSON data dumps on the EDH data download page.

The EDH Photographic Database index page states: “The digital image material of the Photographic Database is with a few exceptions directly accessible. Hitherto it had been the policy that pictures with unclear utilization rights were presented only as thumbnail images. In 2012 as a result of ever increasing requests from the scientific community and with the support of the Heidelberg Academy of the Sciences this policy has been changed. The approval of the institutions which house the monuments and their inscriptions is assumed for the non commercial use for research purposes (otherwise permission should be sought). Rights beyond those just mentioned may not be assumed and require special permission of the photographer and the museum.”

During a discussion with Frank Grieshaber we found out that the information in this paragraph is only available on this webpage, with no individual licence details in the metadata records of the images, either in the XML or the JSON data dumps. It would be useful to be included in the records, though it is not clear how to accomplish this efficiently for each photograph, since all photographers need to be contacted first. Currently, the rights information in the XML records says “Rights Reserved – Free Access on Epigraphischen Fotothek Heidelberg”, which presumably points to the “research purposes” part of the statement on the EDH website.

All other components of EDH – inscriptions, bibliography, geography and people RDF – have been released under Creative Commons Attribution-ShareAlike 3.0 Unported license, which allows for their reuse and repurposing, thus ensuring their sustainability. The images, however, will be the first thing to disappear once the project ends. With unclear licensing and the impossibility of contacting every single photographer, some of whom are not alive anymore and others who might not wish to waive their rights, data reuse becomes particularly problematic.

One possible way of figuring out the copyright of individual images is to check the reciprocal links to the photographic archive of the partner institutions who provided the images, and then read through their own licence information. However, these links are only visible from the HTML and not present in the XML records.

Given that the image metadata in the XML files is relatively detailed and already in place, we decided to focus on the task of image extraction for research purposes, which is covered by the general licensing of the EDH image databank. We prepared a Python script for batch download of the entire image databank, available on the OEDUc GitHub repo. Each image has a unique identifier which is the same as its filename and the final string of its URL. This means that when an inscription has more than one photograph, each one has its individual record and URI, which allows for complete coverage and efficient harvesting. The images are numbered sequentially, and in the case of a missing image, the process skips that entry and continues on to the next one. Since the databank includes some 37,530 plus images, the script pauses for 30 seconds after every 200 files to avoid a timeout. We don’t have access to the high resolution TIFF images, so this script downloads the JPGs from the HTML records.

The EDH images included in the EAGLE MediaWiki are all under an open licence and link back to the EDH databank. A task for the future will be to compare the two lists to get a sense of the EAGLE coverage of EDH images and feed back their licensing information to the EDH image records. One issue is the lack of file-naming conventions in EAGLE, where some photographs carry a publication citation (CIL_III_14216,_8.JPG, AE_1957,_266_1.JPG), a random name (DR_11.jpg) and even a descriptive filename which may contain an EDH reference (Roman_Inscription_in_Aleppo,_Museum,_Syria_(EDH_-_F009848).jpeg). Matching these to the EDH databank will have to be done by cross-referencing the publication citations either in the filename or in the image record.

A further future task could be to embed the image metadata into the image itself. The EAGLE MediaWiki images already have the Exif data (added automatically by the camera) but it might be useful to add descriptive and copyright information internally following the IPTC data set standard (e.g. title, subject, photographer, rights etc). This will help bring the inscription file, image record and image itself back together, in the event of data scattering after the end of the project. Currently linkage exist between the inscription files and image records. Embedding at least the HD number of the inscription directly into the image metadata will allow us to gradually bring the resources back together, following changes in copyright and licensing.

Out of the three tasks we set out to discuss, one turned out to be impractical and unfeasible, one we accomplished and published the code, one remains to be worked on in the future. Ascertaining the copyright status of all images is physically impossible, so all future experiments will be done on the EDH images in EAGLE MediaWiki. The script for extracting JPGs from the HTML is available on the OEDUc GitHub repo. We have drafted a plan for embedding metadata into the images, following the IPTC standard.

OAPEN-UK focus groups, first report

Friday, February 3rd, 2012

The JISC-funded OAPEN-UK (Open Access Publishing in European Networks) project have published a report on the first round of focus groups, held in the British Library late last year. Various groups of stakeholders (in this case academics who author research material) were brought together to discuss issues surrounding open access monograph publication. The conclusions and recommendations are perhaps less radical (or more practical?) than some discussions of open publication in this venue, but the report still raises some valuable issues. (Full disclosure, I participated in this session.)

The report can be found at:

The Digital Archimedes Palimpsest Released

Wednesday, October 29th, 2008

Very exciting news – the complete dataset of the Archimedes Palimpsest project (ten years in the making) has been released today. The official announcement is copied below, but I’d like to point out what I think it is that makes this project so special. It isn’t the object – the manuscript – or the content – although I’m sure the previously unknown texts are quite exciting for scholars. It isn’t even the technology, which includes multispectral imaging used to separate out the palimpsest from the overlying text and the XML transcriptions mapped to those images (although that’s a subject close to my heart).

What’s special about this project is its total dedication to open access principles, and an implied trust in the way it is being released that open access will work. There is no user interface. Instead, all project data is being released under a Creative Commons 3.0 attribution license. Under this license, anyone can take this data and do whatever they want to with it (even sell it), as long as they attribute it to the Archimedes Palimpsest project. The thinking behind this is that, by making the complete project data available, others will step up and build interfaces… create searches… make visualizations… do all kinds of cool stuff with the data that the developers might not even consider.

To be fair, this isn’t the only project I know of that is operating like this; the complete high-resolution photographs and accompanying metadata for manuscripts digitized through the Homer Multitext project are available freely, as the other project data will be when it’s completed, although the HMT as far as I know will also have its own user interface. There may be others as well. But I’m impressed that the project developers are releasing just the data, and trusting that scholars and others will create user environments of their own.

The Stoa was founded on principles of open access. It’s validating to see a high-visibility project such as the Archimedes Palimpsest take those principles seriously.

Ten years ago today, a private American collector purchased the Archimedes Palimpsest. Since that time he has guided and funded the project to conserve, image, and study the manuscript. After ten years of work, involving the expertise and goodwill of an extraordinary number of people working around the world, the Archimedes Palimpsest Project has released its data. It is a historic dataset, revealing new texts from the ancient world. It is an integrated product, weaving registered images in many wavebands of light with XML transcriptions of the Archimedes and Hyperides texts that are spatially mapped to those images. It has pushed boundaries for the imaging of documents, and relied almost exclusively on current international standards. We hope that this dataset will be a persistent digital resource for the decades to come. We also hope it will be helpful as an example for others who are conducting similar work. It published under a Creative Commons 3.0 attribution license, to ensure ease of access and the potential for widespread use. A complete facsimile of the revealed palimpsested texts is available on Googlebooks as “The Archimedes Palimpsest”. It is hoped that this is the first of many uses to which the data will be put.

For information on the Archimedes Palimpsest Project, please visit:

For the dataset, please visit:

We have set up a discussion forum on the Archimedes Palimpsest Project. Any member can invite anybody else to join. If you want to become a member, please email:

I would be grateful if you would circulate this to your friends and colleagues.

Thank you very much

Will Noel
The Walters Art Museum
October 29th, 2008.

Article on PSWPC in LLC June 2008

Monday, May 26th, 2008

David Pritchard, “Working Papers, Open Access, and Cyber-infrastructure in Classical Studies” Literary and Linguistic Computing 2008 23: 149-162; doi:10.1093/llc/fqn005.

Princeton—Stanford Working Papers in Classics (PSWPC) is a web-based series of work-in-progress scripts by members of two leading departments of classics. It introduces the humanities to a new form of scholarly communication and represents a major advance in the free availability of classical-studies scholarship in cyberspace. This article both reviews the initial performance of this open-access experiment and the benefits and challenges of working papers more generally for classical studies. After 2 years of operation PSWPC has proven to be a clear success. This series has built up a large international readership and a sizeable body of pre-prints and performs important scholarly and community-outreach functions. As this performance is largely due to its congruency with the working arrangements of ancient historians and classicists and the global demand for open-access scholarship, the series confirms the viability of this means of scholarly communication, and the likelihood of its expansion in our discipline. But modifications are required to increase the benefits this series brings and the amount of scholarship it makes freely available online. Finally, departments wishing to replicate its success will have to consider other important developments, such as the increasing availability of post-prints, the linking of research funding to open access, and the emergence of new cyber-infrastructure.

Scholarly legacy: an argument for open licensing now?

Monday, March 31st, 2008

Back in November, Gabriel Bodard posted about the importance of attaching explicit licenses (or public domain declarations) to on-line works so as to clarify for users how they can, and can’t, use these works. A new post by Cathy Davidson (“Permission Denied” in Cat in the Stack, 31 March 2008), highlights the case of an academic author who has been unable to include in his book various images of artworks created by the subject of that book because the artists’ heirs have refused permission.

Which all makes me wonder: is explicit release, in one’s own lifetime, of a work into the public domain or under license terms that permit redistribution and remixing, sufficient to prevent post-mortem claw-back by one’s institutional or personal heirs?

Cultural Heritage and Open Access Licenses

Saturday, November 17th, 2007

The Eduserv Foundation has released a report on the use of Creative Commons, Creative Archive, and other open access licenses in the area of British heritage, ‘Snapshot study on the use of open content licences in the UK cultural heritage sector‘. This report (itself made available under a CC-BY license), which collected data from over 100 institutions, seem to indicate that most institutions make data available online, usually for free, but that many have not considered the implications of using an explicit license for this material.

My own experience backs this up: several times in the last year people have approached me either at the Digital Classicist or the Current Epigraphy weblog asking if we could host a ‘free’ publication for them (some even used the words “public domain” to describe their work). I can’t remember a single case of someone who even knew what I meant when I asked if they had considered using a Creative Commons license, or some other way to make explicit what people could or couldn’t do with their material.

I think it is important to make clear to people why this sort of licensing matters. To select only one argument, making it clear that all users are free to recirculate an online text increases the chance that this text will be picked up and archived, not only by individuals and projects, but by large institutions such as Google, the Internet Archive, and the national and international repositories and libraries that are going to be the custodians of all our publications that do not have print manifestations to help them survive the next server meltdown.

The Eduserv report both rings a note of optimism, as a significant number of good licenses are in use, and reminds us that there is still work to be done raising awareness of the licensing issue. This survey and the ongoing work that will arise from it have their part to play in helping to raise the profile of these issues.

(Seen in Creative Commons blog.)

Academic publishers prepare for dirty war against Open Access

Saturday, September 22nd, 2007

According to an article published in this week’s New Scientist (full article requires sub):

An unexpected package arrived on my desk earlier this year. The sender did not give a name, and the return address was false. Inside were copies of emails between senior staff at major scientific publishing houses. They were discussing a surprising topic: plans to hire Eric Dezenhall, a public relations guru who has organised attacks on environmental groups, represented an Enron chief, and authored the book Nail ‘Em! […]

Leaked emails and controversial characters like Dezenhall are not normally associated with the staid world of academic journals, but the big publishers are getting a little spooked. Over the past decade, researchers have started to demand that scientific results be set free. […] This is not a message that all publishers want to hear.

This is, I suppose, not terribly surprising to hear when there is money to be made and lost; those benefiting from the status quo will always fight against any revolution or paradigm shift, but this doesn’t mean that change should or can be stopped. Some academic publishing houses have apparently already protested at the dirty arguments that the AAP are circulating in the name of their membership. In the end, as this article argues, I don’t see how this campaign can actually stop Open Access publishing from becoming huge–but it can, of course, affect US executive decisions.

If you don’t have access to the full New Scientist article, see the following NS blog post, which has links to some of the leaked material as well as other references.

Collaborative article against perpetual copyright

Sunday, June 3rd, 2007

Back in the middle of May, Lawrence Lessig posted a note on his blog pointing to a particularly idiotic op-ed in the NY Times that argued for perpetual copyright. He invited readers to write a response, on his Wiki. 25000 visits and nearly 300 edits later, the resulting article ‘Against Perpetual Copyright‘ is an impressive piece of work, summarising the arguments that include the essential difference between physical property and intellectual property, the stifling effect of strict, long-term copyright on creativity, among others. The piece is a testament to the power of collaborative authorship as well as a strong refutation of the op-ed in question.

See now Lessig’s observations on this article in a recent post, pointing out how he wouuld have focussed the arguments differently (principally by comparing the ability of different copyright models to promote and reward creativity).

Creative Commons and research

Tuesday, May 8th, 2007

A post on the Creative Commons blog draws together four articles on the value of Creative Commons licensing for newspapers, scientists, film students, and Wikipedia “SEOers” respectively. All are worth reading, but it is the article on scientists that is of most interest here. This article, posted at ScienceBlogs on 1st May by Rob Knop makes the case that:

Scientists do not need, and indeed should not have, exclusive (or any) control over who can copy their papers, and who can make derivative works of their papers.

The very progress of science is based on derivative works! It is absolutely essential that somebody else who attempts to reproduce your experiment be able to publish results that you don’t like if those are the results they have. Standard copyright, however, gives the copyright holders of a paper at least a plausible legal basis on which to challenge the publication of a paper that attempts to reproduce the results— clearly a derivative work!

I would extend this argument (and indeed have done so repeatedly and vocally) to assert that this applies to equally to all academic research, including the Humanties. This is a key part of the philosophy behind the Open Source Critical Editions network that I helped convene last year. All published research includes the requirement to publish the “source code” (by way of citations, arguments, primary and secondary references, retraceable argumentation), and the expectation that others will use this “source” to verify, reproduce, modify, or refute your work. Copyright, and especially digital copyright and crippleware, should not be allowed to get in the way of this process because without this freedom a publication can not be considered research.

Fair use and blogs

Wednesday, May 2nd, 2007

This post seen in the Creative Commons blog is relevant to the discussion we had here a few weeks ago about copyright and blogs. What is the status of all these copy-n-pasted paragraphs below?

Copyright and fair use in the blogosphere

April 30th, 2007 by Kaitlin Thaney

A recent incident in the blogosphere has sparked a discussion on the role of copyright and fair use laws in the digital world.

Last week, Shelley Batts – a PhD student – was accused of a fair use violation for pulling a figure and a chart from a scientific paper to post on her blog. Soon after Batts posted the data on her site, she received a cease-and-desist letter via e-mail from lawyers from the Journal of the Science of Food and Agriculture, a journal owned by John Wiley. The representative who contacted her accused her of violating fair use by reproducing the material from the journal on her blog. Batts soon took down the figures, reproduced the data in an Excel format, and avoided legal penalty.

Her experience raises a larger question, though. In the world of blogging where cutting and pasting is common practice, how do copyright and fair use laws apply? Katherine Sharpe addressed this very question on ScienceBlogs, calling on Springer Publishing’s Johannes Velterop and Science Commons’ John Wilbanks to comment.

The full article at the (cc) Science Commons blog.