Archive for the ‘Projects’ Category

Office of Digital Humanities: Search for funded projects

Wednesday, August 20th, 2008

Playing around on the website of the National Endowment for the Humanities’ Office of Digital Humanities this afternoon I came across the Library of Funded Projects, a database of projects funded through the ODH. Visitors can search by Categories (technical focus of the projects, not subject), Grant Programs, or by keyword. Project records include most information one would want including PI, award dates, funding, abstract, link to project website (when one exists), and a space to link project white papers (which are required at the conclusion of all ODH-funded projects).

The LFP is not up-to-date; searches for several of the grant programs come up empty (including those where there are currently funded projects). Even so, this could be an immensely valuable resource to help scholars keep abreast of new work being done in the field, especially the smaller projects supported through the Start-Up program.

(The keyword search, as most keyword searches, needs some working with. “Classics” turns up nothing, while “classical” and “ancient” pull up two different but slightly overlapping lists.)

UPDATE: Are there similar libraries/databases for other national funding agencies (DFG, JISC, etc.)? If so, please cite them in the comments. Thanks!

Problems and outcomes in digital philology (session 3: methodologies)

Thursday, March 27th, 2008

The Marriage of Mercury and Philology: Problems and outcomes in digital philology

e-Science Institute, Edinburgh, March 25-27 2008.

(Event website; programme wiki; original call)

I was asked to summarize the third session of papers in the round table discussion this afternoon. My notes (which I hope do not misrepresent anybody’s presentation too brutally) are transcribed below.

Session 3: Methodologies

1. Federico Meschini (De Montfort University) ‘Mercury ain’t what he used to be, but was he ever? Or, do electronic scholarly editions have a mercurial attitude?’ (Tuesday, 1400)

Meschini gave a very useful summary of the issues facing editors or designers of digital critical editions. The issues he raised included:

  • the need for good metadata standards to address the problems of (inevitable and to some extent desirable) incompatibility between different digital editions;
  • the need for a modularized approach that can include many very specialist tools (the “lego bricks” model);
  • the desirability of planning a flexible structure in advance so that the model can grow organically, along with the recognition that no markup language is complete, so all models need to be extensible.

After a brief discussion of the reference models available to the digital library world, he explained that digital critical editions are different from digital libraries, and therefore need different models. A digital edition is not merely a delivery of information, it is an environment with which a “reader” or “user” interacts. We need, therefore, to engage with the question: what are the functional requirements for text editions?

A final summary of some exciting recent movements, technologies, and discussions in online editions served as a useful reminder that far from taking for granted that we know what a digital critical edition should look like, we need to think very carefully about the issues Mechini raises and other discussions of this question.

2. Edward Vanhoutte (Royal Academy of Dutch Language and Literature, Belgium) ‘Electronic editions of two cultures –with apologies to C.P. Snow’ (Tuesday, 1500)

Vanhoutte began with the rhetorical observation that our approach to textual editions is in adequate because the editions are not as intuitive to users, flexible in what they can contain, and extensible in use and function as a household amenity such as the refrigerator. If the edition is an act of communication, an object that mediates between a text and an audience, then it fails if we do not address the “problem of two audiences” (citing Lavagnino). We serve the audience of our peers fairly well–although we should be aware that even this is a more hetereogenous and varied a group than we sometimes recognise–but the “common audience”, the readership who are not text editors themselves, are poorly served by current practice.

After some comments on different types of editions (a maximal edition containing all possible information would be too rich and complex for any one reader, so minimal editions of different kinds can be abstracted from this master, for example), and a summary of Robinson’s “fluid, cooperative, and distributed editions”, Vanhoutte made his own recommendation. We need, in summary, to teach our audience, preferably by example, how to use our editions and tools; how to replicate our work, the textual scholarship and the processes performed on it; how to interact with our editions; and how to contribute to them.

Lively discussion after this paper revolved around the question of what it means to educate your audience: writing a “how to” manual is not the best way to encourage engagement with ones work, but providing multiple interfaces, entry-points, and cross-references that illustrate the richness of the content might be more accessible.

3. Peter Robinson (ITSEE, Birmingham) ‘What we have been doing wrong in making digital editions, and how we could do better?’ (Tuesday, 1630)

Robinson began his provocative and speculative paper by considering a few projects that typify things we do and do not do well: we do not always distribute project output successfully; we do not always achieve the right level of scholarly research value. Most importantly, it is still near-impossible for a good critical scholar to create an online critical edition without technical support, funding for the costs of digitization, and a dedicated centre for the maintenance of a website. All of this means that grant funding is still needed for all digital critical work.

Robinson has a series of recommendations that, he hopes, will help to empower the individual scholar to work without the collaboration of a humanities computing centre to act as advisor, creator, librarian, and publisher:

  1. Make available high-quality images of all our manuscripts (this may need to be funded by a combination of goverment money, grant funding, and individual users paying for access to the results).
  2. Funding bodies should require the base data for all projects they fund to be released under a Creative Commons Attribution-ShareAlike license.
  3. Libraries and not specialist centres should hold the data of published projects.
  4. Commercial projects should be involved in the production of digital editions, bringing their experience of marketing and money-making to help make projects sustainable and self-funding.
  5. Most importantly, he proposes the adoption of common infrastructure, a set of agreed descriptors and protocols for labelling, pointing to, and sharing digital texts. An existing protocol such as the Canonical Text Services might do the job nicely.

4. Manfred Thaller (Cologne) ‘Is it more blessed to give than to receive? On the relationship between Digital Philology, Information Technology and Computer Science’ (Wednesday, 0950)

Thaller gave the last paper, on the morning of the third day of this event, in which he asked (and answered) the over-arching question: Do computer science professionals already provide everything that we need? And underlying this: Do humanists still need to engage with computer science at all? He pointed out two classes of answer to this question:

  • The intellectual response: there are things that we as humanists need and that computer science is not providing. Therefore we need to engage with the specialists to help develop these tools for ourselves.
  • The political response: maybe we are getting what we need already, but we will experience profitable side effects from collaborating with computer scientists, so we should do it anyway.

Thaller demonstrated via several examples that we do not in fact get everything we need from computer scientists. He pointed out that two big questions were identified in his own work twelve years ago: the need for software for dynamic editions, and the need for mass digitization. Since 1996 mass digitization has come a long way in Germany, and many projects are now underway to image millions of pages of manuscripts and incunabula in that country. Dynamic editions, on the other hand, while there has been some valuable work on tools and publications, seem very little closer than they were twelve years ago.

Most importantly, we as humanists need to recognize that any collaboration with computer scientists is a reciprocal arrangement, that we offer skills as well as receive services. One of the most difficult challenges facing computer scientists today, we hear, is to engage with, organise, and add semantic value to the mass of imprecise, ambiguous, incomplete, unstructured, and out-of-control data that is the Web. Humanists have spent the last two hundred years studying imprecise, ambiguous, incomplete, unstructured, and out-of-control materials. If we do not lend our experience and expertise to help the computer scientists solve this problem, than we can not expect free help from them to solve our problems.

Services and Infrastructure for a Million Books (round table)

Monday, March 17th, 2008

Million Books Workshop, Friday, March 14, 2008, Imperial College London.

The second of two round tables in the afternoon of the Million Books Workshop, chaired by Brian Fuchs (Imperial College London), asked a panel of experts what services and infrastructure they would like to see in order to make a Million Book corpus useful.

  1. Stuart Dunn (Arts and Humanities e-Science Support Centre): the kinds of questions that will be asked of the Million Books mean that the structure of this collection needs to be more sophisticated that just a library catalogue
  2. Alistair Dunning (Archaeological Data Service & JISC): powerful services are urgently needed to enable humanists both to find and to use the resources in this new collection
  3. Michael Popham (OULS but formerly director of e-Science Centre): large scale digitization is a way to break down the accidental constraints of time and place that limit access to resources in traditional libraries
  4. David Shotton (Image Bioinformatics Research Group): emphasis is on accessibility and the semantic web. It is clear than manual building of ontologies does not scale to millions of items, therefore data mining and topic modelling are required, possible assisted by crowdsourcing. It is essential to be able to integrate heterogeneous sources in a single, semantic infrastructure
    1. Dunning: citability and replicability of research becomes a concern with open publication on this scale
    2. Dunn: the archaeology world has similar concerns, cf. the recent LEAP project
  5. Paul Walk (UK Office for Library and Information Networking): concerned with what happens to the all-important role of domain expertise in this world of repurposable services: where is the librarian?
    1. Charlotte Roueché (KCL): learned societies need to play a role in assuring quality and trust in open publications
    2. Dunning: institutional repositories also need to play a role in long-term archiving. Licensing is an essential component of preservation—open licenses are required for maximum distribution of archival copies
    3. Thomas Breuel (DFKI): versioning tools and infrastructure for decentralised repositories exist (e.g. Mercurial)
    4. Fuchs: we also need mechanisms for finding, searching, identifying, and enabling data in these massive collections
    5. Walk: we need to be able to inform scholars when new data in their field of interest appears via feeds of some kind

(Disclaimer: this is only one blogger’s partial summary. The workshop organisers will publish an official report on this event.)

What would you do with a million books? (round table)

Sunday, March 16th, 2008

Million Books Workshop, Friday, March 14, 2008, Imperial College London.

In the afternoon, the first of two round table discussions concerned the uses to which massive text digitisation could be put by the curators of various collections.

The panellists were:

  • Dirk Obbink, Oxyrhynchus Papyri project, Oxford
  • Peter Robinson, Institute for Textual Scholarship and Electronic Editing, Birmingham
  • Michael Popham, Oxford University Library Services
  • Charlotte Roueché, EpiDoc and Prosopography of the Byzantine World, King’s College London
  • Keith May, English Heritage

Chaired by Gregory Crane (Perseus Digital Library), who kicked off by asking the question:

If you had all of the texts relevant to your field—scanned as page images and OCRed, but nothing more—what would you want to do with them?

  1. Roueché: analyse the texts in order to compile references toward a history of citation (and therefore a history of education) in later Greek and Latin sources.
  2. Obbink: generate a queriable corpus
  3. Robinson: compare editions and manuscripts for errors, variants, etc.
    1. Crane: machine annotation might achieve results not possible with human annotation (especially at this scale), particularly if learning from a human-edited example
    2. Obbink: identification of text from lost manuscripts and witnesses toward generation of stemmata. Important question: do we also need to preserve apparatus criticus?
  4. May: perform detailed place and time investigations into a site preparatory to performing any new excavations
    1. Crane: data mining and topic modelling could lead to the machine-generation of an automatically annotated gazeteer, prosopography, dictionary, etc.
  5. Popham: metadata on digital texts scanned by Google not always accurate or complete; not to academic standards: the scanning project is for accessibility, not preservation
    1. Roueché: Are we talking about purely academic exploitation, or our duty as public servants to make our research accessible to the wider public?
    2. May: this is where topic analysis can make texts more accessible to the non-specialist audience
    3. Brian Fuchs (ICL): insurance and price comparison sites, Amazon, etc., have sophisticated algorithms for targeting web materials at particular audiences
    4. Obbink: we will also therefore need translations of all of these texts if we are reaching out to non-specialists; will machine translation be able to help with this?
    5. Roueché: and not just translations into English, we need to make these resources available to the whole world.

(Disclaimer: this summary is partial and partisan, reflecting those elements of the discussion that seemed most interesting and relevant to this blogger. The workshop organisers will publish an official report on this event presently.)

Million Books Workshop (brief report)

Saturday, March 15th, 2008

Imperial College London.
Friday, March 14, 2008.

David Smith gave the first paper of the morning on “From Text to Information: Machine Translation”. The discussion included a survey of machine translation techniques (including the automatic discovery of existing translations by language comparison), and some of the value of cross-language searching.

[Please would somebody who did not miss the beginning of the session provide a more complete summary of Smith's paper?]

Thomas Breuel then spoke on “From Image to Text: OCR and Mass Digitisation” (this would have been the first paper in the day, kicking off the developing thread from image to text to information to meaning, but transport problems caused the sequence of presentations to be altered). Breuel discussed the status of professional OCR packages, which are usually not very trainable and have their accuracy constrained by speed requirements, and explained how the Google-sponsored but Open Source OCRopus package intends to improve on this situation. OCRopus is highly extensible and trainable, but currently geared to the needs of the Google Print project (and so while effective at scanning book pages, may be less so for more generic documents). Currently in alpha-release and incorporating the Tesseract OCR engine, this tool currently has a lower error-rate than other Open Source OCR tools (but not the professional tools, which often contain ad hoc code to deal with special cases). A beta release is set for April 2008, which will demo English, German, and Russian language versions, and release 1.0 is scheduled for Fall 2008. Breuel also briefly discussed the hOCR microformat for describing page layouts in a combination of HTML and CSS3.

David Bamman gave the second in the “From Text to Information” sequence of papers, in which he discussed building a dynamic lexicon using automated syntax recognition, identifying the grammatical contexts of words in a digital text. With a training set of some thousands of words of Greek and Latin tree-banked by hand, auto-syntactic parsing currently achieves an accuracy rate something above 50%. While this is still too high a rate of error to make this automated process useful as an end in itself, to deliver syntactic tagging to language students, for example, it is good for testing against a human-edited lexicon, which provides a degree of control. Usage statistics and comparisons of related words and meanings give a good sense of the likely sense of a word or form in a given context.

David Mimno completed the thread with a presentation on “From Information to Meaning: Machine Learning and Classification Techniques”. He discussed automated classification based on typical and statistical features (usually binary indicators: is this email spam or not? Is this play tragedy or comedy?). Sequences of objects allow for a different kind of processing (for example spell-checking), including named entity recognition. Names need to be identified not only by their form but by their context, and machines do a surprisingly good job at identifying coreference and thus disambiguating between homonyms. A more flexible form of automatic classification is provided by topic modelling, which allows mixed classifications and does not require the definition of labels. Topic modelling is the automatic grouping of topics, keywords, components, relationships by the frequency of clusters of words and references. This modelling mechanism is an effective means for organising a library collection by automated topic clusters, for example, rather than by a one-dimensional and rather arbitrary classmark system. Generating multiple connections between publications might be a more effective and more useful way to organise a citation index for Classical Studies than the outdated project that is l’Année Philologique.

Simon Overell gave a short presentation on his doctoral research into the distribution of location references within different language versions of Wikipedia. Using the tagged location links as disambiguators, and using the language cross-reference tags to compare across the collections, he uses the statistics compiled to analyse bias (in a supposedly Neutral Point-Of-View publication) and provide support for placename disambiguation. Overell’s work is in progress, and he is actively seeking collaborators who might have projects that could use his data.

In the afternoon there were two round-table discussions on the subjects of “Collections” and “Systems and Infrastructure” that I may report on later if my notes turn out to be usable.

Information Behaviour of the Researcher of the Future (report)

Sunday, January 20th, 2008

The British Library and JISC commissioned the Centre for Information Behaviour and the Evaluation of Research (CIBER) at UCL to produce a report on Information Behaviour of the Researcher of the Future. It’s well worth reading the full reportin PDF (which I haven’t finished yet) but among the conclusions listed by the BL press release on this are:

  • All age groups revealed to share so-called ‘Google Generation’ traits
  • New study argues that libraries will have to adapt to the digital mindset
  • Young people seemingly lacking in information skills; strong message to the government and society at large

A new study overturns the common assumption that the ‘Google Generation’ – youngsters born or brought up in the Internet age – is the most web-literate. The first ever virtual longitudinal study carried out by the CIBER research team at University College London claims that, although young people demonstrate an apparent ease and familiarity with computers, they rely heavily on search engines, view rather than read and do not possess the critical and analytical skills to assess the information that they find on the web.

This is a very interesting combination of conclusions–although many of us have been observing for years that while our youngest students may think they know everything about computers they often don’t actually know the first thing about using the Internet for research (nor, needless to say, about opening up a computer–either physically or metaphorically–and playing with its innards). That the GoogleGen traits such as short attention span, impatience with anything not in the first page of search results, and readiness to flit from topic to topic in the wikiblogoatomosphere are not restricted to teenagers is not news to we “gray ADDers” either.

The suggestion that libraries, the ultimate custodians of both raw data and interpreted information (and, I would argue, especially schools and universities), need to be functioning in the spirit of this new digital world and serving the needs of our plugged-in and distracted community. Not by making information available in bite-sized, easily identified and digested pieces–that would be pandering, not serving–but by providing educational resources alongside the traditional preserved texts/media. And microformatting it (because our target-audience don’t necessarily know they’re our audience). And future-proofing it.

Web-based Research Tools for Mediterranean Archaeology

Friday, January 4th, 2008

Workshop at the 2008 annual meeting of the Archaeological Institute of America in Chicago

Sunday, 6 January 2008, 9:00 a.m. – noon, Water Tower, Bronze Level, West Tower, Hyatt Regency Hotel

Moderators: Rebecca K. Schindler and Pedar Foss, DePauw University

In recent years several powerful web-based research tools for Mediterranean archaeology have emerged; this workshop brings together researchers who are building and/or maintaining them. Having examined each other’s projects beforehand, presenters demonstrate their own projects, assess their functionality and usefulness, and discuss future needs and possibilities.

The projects range from macro-scale (country- or Mediterranean-wide metadata) to micro-scale (specific sites and artifact types). Two initiatives are on-line databases for archaeological fieldwork: Foss and Schindler demonstrate MAGIS, and inventory of survey projects across Europe and the Mediterranean; Fentress demonstrates the Fasti OnLine, which records excavations in Italy and several neighboring countries. Both projects employ web-based GIS to allow spatial and database searches. With the release of Google Earth and Google Maps, GIS functionality for tracking landscapes has become widely available to mainstream, not just specialist, users. Savage offers the Jordan Archaeological Database and Information System (JADIS) as a case-study of how Google-GIS functionality may be employed in archaeological research.

Numerous archaeological projects use the web to present and collect data (to varying degrees of detail). Watkinson and Hartzler demonstrate the Agora Excavations on-line, an example of how the web can clearly present a complex, long-excavated site through its organization of artifacts, documentary materials, and visual interfaces. Heath then gives a close-up look at the on-line study collection of ceramics from Ilion; what is the potential for Web-based reference collections to enhance the study of ceramic production and distribution?

ArchAtlas, presented by Harlan and Wilkinson, and the Pleiades Project, presented by Elliott, both seek to link geo-spatial and archaeological data through on-line collaborations. These projects raise issues of interoperability and shared datasets. ArchAtlas aims to be a hub for interpretive cartographic visualization of archaeological problems and data; Pleiades is developing an atlas of ancient sites. Finally, Chavez from the Perseus Project considers the challenges of accessibility, sustainability, and viability in the ever-changing world of technology — how do we ensure that these projects are still usable 20 years from now, and what new resources can we imagine developing?

These projects are representative of the types of on-line initiatives for Mediterranean archaeology in current development. Their tools enable the compilation and dissemination of large amounts of information that can lead to interesting new questions about the Mediterranean world. This is a critical time to step back, assess the resources, and consider future needs and desires.

Panelists:

  • Pedar Foss (DePauw University)
  • Elizabeth Fentress (International Association for Classical Archaeology)
  • Stephen Savage (Arizona State University)
  • Bruce Hartzler and Charles Watkinson (American School of Classical Studies at Athens)
  • Sebastian Heath (American Numismatic Society)
  • Tom Elliott (University of North Carolina at Chapel Hill)
  • Debi Harlan (Oxford University)
  • Toby Wilkinson (British Institute at Ankara)
  • Robert Chavez (Tufts University)

Technology Collaboration Awards

Saturday, December 15th, 2007

An announcement from Mellow (via the CHE):

Five universities were among the 10 winners of the Mellon Awards for Technology Collaboration, announced this week. They will share $650,000 in prize money for “leadership in the collaborative development of open-source software tools with application to scholarship in the arts and humanities.” The university winners were:

  • Duke University for the OpenCroquet open-source 3-D virtual worlds environment
  • Open Polytechnic of New Zealand for several projects, including the New Zealand Open Source Virtual Learning Environment
  • Middlebury College for the Segue interactive-learning management system
  • University of Illinois at Champaign-Urbana for two projects: the Firefox Accessibility Extension and the OpenEAI enterprise application integration project
  • University of Toronto for the ATutor learning content-management system.

Other winners included the American Museum of the Moving Image for a collections-management system, and the Participatory Culture Foundation for the Miro media player. The winners were announced at the fall task-force meeting of the Coalition for Networked Information, and awards were presented by the World Wide Web pioneer Tim Berners-Lee. –Josh Fischman

Perseus code goes Open Source!

Tuesday, November 13th, 2007

From Greg Crane comes the much-anticipated word that all of the hopper code and much of the content in Perseus is now officially open sourced:

November 9, 2007: o *Install Perseus 4.0 on your computer*:

All of the source code for the Perseus Java Hopper and much of the content in Perseus is now available under an open source license. You can download the code, compile it, and run it on your own system. This requires more labor and a certain level of expertise for which we can only provide minimal support. However, since it will be running on your own machine, it can be much faster than our website, especially during peak usage times. You also have the option to install only certain collections or texts on your version, making it as specialized as you wish. Also, if you want to use a different system to make the content available, you can do so within the terms of the Creative Commons http://creativecommons.org/licences/by-nc-sa/3.0/us license. This is the first step in open sourcing the code: you can modify the code as much as you want, but at this time, we cannot integrate your changes back into our system. That is our ultimate goal, so keep a look out for that!

Download source code here
http://sourceforge.net/projects/perseus-hopper

Download text data here
http://www.perseus.tufts.edu/%7Ersingh04/

Open Library

Saturday, October 27th, 2007

Adding this grandiose Open Library system to the Internet Archive strikes me as simply brilliant. In this case “fully open” is defined as “a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data. In an era where library data and Internet databases are being run by money-seeking companies behind closed doors, it’s more important than ever to be open.”

But simply building a new database wasn’t enough. We needed to build a new wiki to take advantage of it. So we built Infogami. Infogami is a cleaner, simpler wiki. But unlike other wikis, it has the flexibility to handle different classes of data. Most wikis only let you store unstructured pages — big blocks of text. Infogami lets you store semistructured data…

Each infogami page (i.e. something with a URL) has an associated type. Each type contains a schema that states what fields can be used with it and what format those fields are in. Those are used to generate view and edit templates which can then be further customized as a particular type requires.

The result, as you can see on the Open Library site, is that one wiki contains pages that represent books, pages that represent authors, and pages that are simply wiki pages, each with their own distinct look and edit templates and set of data.

English-Latin-English dictionaries

Monday, October 1st, 2007

from the mailbag:

My name is Silvio and I’ve recently concluded a set of English-Latin-English dictionaries which I thought you could be interested in sharing with your site’s visitors. The dictionaries provide clear and precise translations and are absolutely free of charge.

Latin Dictionary: http://www.babylon.com/define/112/Latin-Dictionary.html

If you have any feedback on them, I’d be happy to hear.

Regards,

Silvio Branco

(Note: I cannot vouch for these dictionaries but simply pass along the announcement.) 

Two new blogs

Thursday, September 27th, 2007
  • Tom Elliott, Horothesia: thoughts and comments across the boundaries of computing, ancient history, epigraphy and geography.
  • Shawn Graham, Electric Archaeology: Digital Media for Learning and Research.  Agent based modeling, games, virtual worlds, and online education for archaeology and history.

Cuneiform Digital Library Initiative and Digital Library Program of UCLA

Wednesday, September 26th, 2007

The Cuneiform Digital Library Initiative and the Digital Library Program of the University of California, Los Angeles, are pleased to announce their successful proposal to the Institute for Museum and Library Services program “National Leadership Grants: Building Digital Resources” for funding of a two-year project dedicated to improving data management and archiving tools in Humanities research.

Project Title: “Cuneiform Digital Library Initiative: Second Generation”

The UCLA University Library and UCLA’s Department of Near Eastern Languages and Cultures will create the Cuneiform Digital Library Initiative: Second Generation (CDLI 2). The project will migrate 450,000 legacy archival and access images and metadata from CDLI to UCLA’s Digital Library Content System, standardizing and upgrading the metadata to improve discovery and enable content archiving within the California Digital Library’s Digital Preservation Repository. The project will add 7,000 digital artifacts with cuneiform inscriptions, including collections housed at the University of Chicago’s Oriental Institute and in Syrian national museums. This project will ensure the long-term preservation of text inscribed on endangered ancient cuneiform tablets. (see the IMLS notice of grants in this cycle)

Principal Investigators:

Stephen Davison
Robert K. Englund

Virtual London shelved as OS refuse to license data to Google

Wednesday, August 29th, 2007

Seen in last week’s New Scientist:

A 3D software model of London containing over 3 million buildings in photorealistic detail is now unlikely to reach the public because of a dispute between a UK government agency and Google.

The full article requires subscription, but the long and short of it is that Google wanted to incorporate the Ordnance Survey-derived data from the Centre for Advanced Spatial Awareness (at UCL) into Google Earth, and were negotiating for a one-off license fee to cover the rights. However, the British agency Ordnance Survey refused to license their data on anything but a license that required payments based on the number of users. Some mapping and visualisation experts fear that this is more significant than a simple failure of two commercial entities to reach an agreement.

Timothy Foresman, director-general of the fifth International Symposium on Digital Earth in San Francisco in June, fears that OS’s decision could set a precedent: “The OS model is a dinosaur,” he says. “If the UK community doesn’t band together and make this a cause célèbre, then they will find the road is blocked as further uses [of the OS data] become known.”

E-Science, Imaging Technology and Ancient Documents

Wednesday, August 22nd, 2007

See and forwarded from Classicists mailing list

————————————————–

UNIVERSITY OF OXFORD

FACULTY OF CLASSICS

Sub-Faculty of Ancient History

E-Science, Imaging Technology and
Ancient Documents

Applications are invited for two posts for which funding has been secured through the AHRC-EPSRC-JISC Arts and Humanities E-Science initiative to support research on the application of Information Technology to ancient documents. Both posts are attached to a project which will develop a networked software system that can support the imaging, documentation, and interpretation of damaged texts from the ancient world, principally Greek and Latin papyri, inscriptions and writing tablets. The work will be conducted under the supervision of Professors Alan Bowman FBA, Sir Michael Brady FRS FREng (University of Oxford) and and Dr. Melissa Terras (University College London).

1. A Doctoral Studentship for a period of 4 years from 1 January, 2008. The studentship will be held in the Faculty of Classics (Sub-Faculty of Ancient History) and supported at the Centre for the Study of Ancient Documents and the Oxford E-Research Centre. The Studentship award covers both the cost of tuition fees at Home/EU rates and a maintenance grant. To be eligible for a full award, the student must have been ordinarily resident in the UK for a period of 3 years before the start of the award.

2. A postdoctoral Research Assistantship for a period of 3 years from 1 January, 2008. The post will be held in the Faculty of Classics (Sub-Faculty of Ancient History) and supported at the Centre for the Study of Ancient Documents and the Oxford E-Research Centre. The salary will be in the range of £26,666 – £31,840 p.a. Applicants must have expertise in programming and Informatics and an interest in the application of imaging technology and signal-processing to manuscripts and documents.

The deadline for receipt of applications is 21 September 2007. Further details about both posts, the project, the qualifications required and the method of application are available from Ms Ghislaine Rowe, Graduate Studies Administrator, Ioannou Centre for Classical and Byzantine Studies, 66 St Giles’ , Oxford OX1 3LU (01865 288397, ghislaine.rowe@classics.ox.ac.uk). It is hoped that interviews will be held and the appointments made on 11 October.

Professor Alan Bowman
Camden Professor of Ancient History
Brasenose College,
Oxford OX1 4AJ
+44 (0)1865 277874

Director, Centre for the Study of Ancient Documents
The Stelios Ioannou School for Research in Classical and Byzantine Studies
66 St Giles’
Oxford OX1 3LU
+44 (0)1865 610227

The Common Information Environment and Creative Commons

Sunday, August 5th, 2007

Seen on the Creative Commons blog:

A study titled “The Common Information Environment and Creative Commons” was funded by Becta, the British Library, DfES, JISC and the MLA on behalf of the Common Information Environment. The work was carried out by Intrallect and the AHRC Research Centre for studies in Intellectual Property and Technology Law and a report was produced in the Autumn of 2005. During the Common Information Environment study it was noted that there was considerable enthusiasm for the use of Creative Commons licences from both cultural heritage organisations and the educational and research community. In this study we aim to investigate if this enthusiasm is still strong and whether a significant number of cultural heritage organisations are publishing digital resources under open content licences.

(Full report.)

This is an interesting study worth watching, and hopefully the conclusions and recommendations will include advice on coherent legal positions with regards to Open Content licensing. (See the controversy surrounding yesterday’s post.)

UK JISC Digitisation Conference 2007

Wednesday, August 1st, 2007

Joint Information Systems Committee

Copied from JISC Digitisation Blog

“In July 2007 JISC held a two-day digitisation conference in Cardiff and the event was live blogged and podcasted. Here you can find links to all the resources from the conference, from Powerpoint presentations and audio to the live reports and conference wiki.”

The link to this blog which has audio, Powerpoints and PDFs from the wide range of speakers:

http://involve.jisc.ac.uk/wpmu/digitisation/digitisation-conference-2007/


There is much there about building digital content and e-resources.

More can be found about the JISC digitisation programme at: 

http://www.jisc.ac.uk/digitisation_home.html

Electronic corpora of ancient languages

Wednesday, July 25th, 2007

Posted to the Digital Classicist list (from ancientpglinguistics) by Kalle Korhonen:

Electronic corpora of ancient languages

International Conference
Prague (Czech Republic), November 16-17th, 2007
http://enlil.ff.cuni.cz/ecal/
Call for papers

Aims of conference

Electronic corpora of ancient languages offer important information about the culture and history of ancient civilizations, but at the same time they constitute a valuable source of linguistic information. The scholarly community is increasingly aware of the importance of computer-aided analysis of these corpora, and of the rewards it can bring. The construction of electronic corpora for ancient languages is a complex task. Many more pieces of information have to be taken into account than for living languages, e.g. the artefact bearing the text, lacunae, level of restoration, etc. The electronic corpora can be enriched with links to images, annotations, and other secondary sources. The annotations should deal with matters such as textual damage, possible variant readings, etc., as well as with many features specific to ancient languages. (more…)

Chiron pool at Flickr

Wednesday, June 27th, 2007

Alun Salt notes

Recently the 5000th photo was uploaded to the Chiron pool at Flickr. That’s over 5000 photos connected to antiquity which you can pick up and use in presentations or blogs for free. It’s due in no small part to the submissions by Ovando and MHarrsch, but there’s 130 other members. It’s a simple interface and an excellent example of what you can do with Flickr.

You can see the latest additions to Chiron in the photobar at the top of the page and you can visit the website of the people who had such a good idea at Chironweb.

Forthcoming lectures on arts and humanities e-science

Wednesday, June 27th, 2007

Forwarded from AHESC Arts and Humanites e-Science Support Centre
(http://www.ahessc.ac.uk/)

The next lectures in the e-Science in the Arts and Humanities Theme (see http://www.ahessc.ac.uk/theme) begin next week. The Theme, organized by the Arts and Humanities e-Science Support Centre (AHeSSC) and hosted by the e-Science Institute in Edinburgh, aims to explore the new challenges for research in the Arts and Humanities
and to define the new research agenda that is made possible by e-Science technology.

The lectures are:

Monday 2 July: Grid Enabling Humanities Datasets

Friday 6 July: e-Science and Performance

Monday 23 July: Aspects of Space and Time in Humanities e-Science

In all cases it will be possible to view the lecture on webcast, and to ask questions or contribute to the debate, in real time via the arts-humanities.net blog feature. Please visit http://wiki.esi.ac.uk/ E-Science_in_the_Arts_and_Humanities, and follow the ‘Ask questions
during the lecture’ link for more information about the blog, and the ‘More details’ link for more information about the events themselves and the webcasts.

AHeSSC forms a critical part of the AHRC-JISC initiative on e-Science in Arts and Humanities research. The Centre is hosted by King’s College London and located at the Arts and Humanities Data Service (AHDS) and the AHRC Methods Network. AHeSSC exists to support, co-ordinate and promote e-Science in all arts and humanities disciplines, and to liaise with the e-Science and e-Social Science communities, computing, and information sciences.

Please contact Stuart Dunn (stuart.dunn[at]kcl.ac.uk) or Tobias Blanke
(tobias.blanke[at]kcl.ac.uk) at AHeSSC for more information.

100+ million word corpus of American English (1920s-2000s)

Monday, June 25th, 2007

Saw this on Humanist. Anything out there and also freely available for UK English?

A new 100+ million word corpus of American English (1920s-2000s) is now freely available at:

http://corpus.byu.edu/time/

The corpus is based on more than 275,000 articles in TIME magazine from 1923 to 2006, and it contains articles on a wide range of topics – domestic and international, sports, financial, cultural, entertainment, personal interest, etc.

The architecture and interface is similar to the one that we have created for our version of the British National Corpus (see http://corpus.byu.edu/bnc), and it allows users to:

— Find the frequency of particular words, phrases, substrings (prefixes, suffixes, roots) in each decade from the 1920s-2000s. Users can also limit the results by frequency in any set of years or decades. They can also see charts that show the totals for all matching strings in each decade (1920s-2000s), as well as each year within a given decade.

— Study changes in syntax since the 1920s. The corpus has been tagged for part of speech with CLAWS (the same tagger used for the BNC), and users can easily carry out searches like the following (from among endless possibilities): changes in the overall frequency of “going + to + V”, or “end up V-ing”, or preposition stranding (e.g. “[VV*] with .”), or phrasal verbs (1920s-1940s vs 1980s-2000s).

— Look at changes in collocates to investigate semantic shifts during the past 80 years. Users can find collocates up to 10 words to left or right of node word, and sort and limit by frequency in any set of years or decades.

— As mentioned, the interface is designed to easily permit comparisons between different sets of decades or years. For example, with one simple query users could find words ending in -dom that are much more frequent 1920s-40s than 1980s-1990s, nouns occurring with “hard” in 1940s-50s but not in the 1960s, adjectives that are more common 2003-06 than 2000-02, or phrasal verbs whose usage increases markedly after the 1950s, etc.

— Users can easily create customized lists (semantically-related words, specialized part of speech category, morphologically-related words, etc), and then use these lists directly as part of the query syntax.

———-

For more information, please contact Mark Davies (http://davies-linguistics.byu.edu), or visit:

http://corpus.byu.edu/

for information and links to related corpora, including the upcoming BYU American National Corpus [BANC] (350+ million words, 1990-2007+).

————————————————————————
—– Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
Web: davies-linguistics.byu.edu

Promise and challenge: augmenting places with sources

Tuesday, June 19th, 2007

Bill Turkel has some very interesting things to say about “the widespread digitization of historical sources” and — near and dear to my heart — “augmenting places with sources”:

The last paragraph in “Seeing There” resonated especially, given what we’re trying to do with Pleiades:

The widespread digitization of historical sources raises the question of what kinds of top-level views we can have into the past. Obviously it’s possible to visit an archive in real life or in Second Life, and easy to imagine locating the archive in Google Earth. It is also possible to geocode sources, link each to the places to which it relates or refers. Some of this will be done manually and accurately, some automatically with a lower degree of accuracy. Augmenting places with sources, however, raises new questions about selectivity. Without some way of filtering or making sense of these place-based records, what we’ll end up with at best will be an overview, and not topsight.

There’s an ecosystem of digital scholarship building. And I’m not talking about SOAP, RDF or OGC. I’m talking about generic function and effect …  Is your digital publication epigraphic? Papyrological? Literary? Archaeological? Numismatic? Encyclopedic? A lumbering giant library book hoover? Your/my data is our/your metadata (if we/you eschew walls and fences). When we all cite each other and remix each other’s data in ways that software agents can exploit, what new visualizations/abstractions/interpretations will arise to empower the next generation of scholarly inquiry? Stay tuned (and plug in)!

Rome Reborn 1.0

Monday, June 11th, 2007

from the Chronicle for Higher Education:

Ancient Rome Restored — Virtually

A group of Virginians and Californians has rebuilt ancient Rome. And today they received the grateful thanks of the modern city’s mayor. The rebuilding marked by this ceremony has been digital. Researchers from the University of Virginia and the University of California at Los Angeles led an international team of archaeologists, architects, and computer scientists in assembling a huge recreation of the city. Rome Reborn 1.0 shows Rome circa 320 AD as it appeared within the 13 miles of Aurelian Walls that encircled it. In the 3D model, users can navigate through and around all the buildings and streets, including the Roman Senate House, the Colosseum, and th e Temple of Venus and Rome. And of course, since the city is virtual, it can be updated as new scientific discoveries are made about the real remains. –Josh Fischman

The RR website repays browsing. The still image of the interior of the Curia Julia is unusually attractive to my eyes, for a digital reconstruction. Of greater interest is what’s said under “Future of the Project,” namely that “The leaders of the project agree that they should shift their emphasis from creating digital models of specific monuments to vetting and publishing the models of other scholars.” I hope that process gets underway.

Update: Troels Myrup Kristensen has his doubts:

Notice the absence of signs of life – no people, no animals, no junk, no noises, no smells, no decay. The scene is utterly stripped of all the clutter that is what really fascinates us about the past. The burning question is whether this kind of (expensive and technology-heavy) representation really gives us fundamentally new insights into the past? From what I’ve seen so far of this project, I’m not convinced that this is the case.

Robot Scans Ancient Manuscript in 3-D

Tuesday, June 5th, 2007

Amy Hackney Blackwell has a new piece in Wired on the just-concluded month-long effort to digitize Venetus A at the Biblioteca Marciana in Venice.  (There’s a nice gallery of images too.)
I was fortunate to be part of this CHS-sponsored team for one week. Ultimately, we managed to acquire 3-D data as well as very high resolution images for three different annotated manuscripts of the Iliad. All of this material will be made available on-line on an Open Access basis.

OA in Classics…

Monday, June 4th, 2007

Josiah Ober, Walter Scheidel, Brent D. Shaw and Donna Sanclemente, “Toward Open Access in Ancient Studies: The Princeton-Stanford Working Papers in Classics” in Hesperia, Volume: 76, Issue: 1. Cover date: Jan-Mar 2007