Archive for December, 2004
Publish or Be Damned is a recent BBC discussion of academic publishing issues, including the Open Access controversy.
Odyssey, a magazine about research at the University of Kentucky, has just published an article about the efforts of CS professor Brent Seales and his team to develop non-destructive methods for extracting information from delicate cultural artefacts. The project is called EDUCE, which stands for Enhanced Digital Unwrapping for Conservation and Exploration. More here.
From one of the Wordherders, extremely unwelcome news:
The Education Department has canceled its annual grant competition for the Fund for the Improvement of Postsecondary Education because Congress has earmarked the bulk of the program’s $163.6-million budget for pork-barrel projects.
The program’s budget, set in the spending bill for the 2005 fiscal year that Congress approved last month, contains more than 400 pork-barrel projects ranging in size from $25,000 to $5-million and costing a total of $146.2-million. That leaves only $17.4-million to continue support for existing grants, which means that Fipse program managers will not be able to finance any of the 1,530 preliminary proposals that have already been peer-reviewed.
Lots of sources today on Google’s plan to digitize hundreds of terabytes of scholarship in several of the world’s largest research libraries over the next decade:
- Search Engine Watch
- Boston Globe, New York Times, BBC
- Chronicle of Higher Education
- Harvard University Library FAQ
Update: Kevin Drum, blogger-in-residence for the Washington Monthly, offers a pretty good little rant against the anti-Google op-ed in the LA Times by Michael Gorman, president-elect of the American Library Association. Juicy bit: “How can a scholar possibly have such a narrow mind — and a scholar of books, no less? Suggesting that Google should limit itself to reference books and leave everything else alone bespeaks a paucity of both spirit and vision that’s staggering.”
From TEI-L, a mention of the new Wikipedia entry on the TEI.
Classic end-of-term displacement activity: I’ve created a bunch of Firefox search plugins for open content sites (Perseus, the Stoa, the Suda on Line, and the Latin Library), to accompany one for Perseus I found ready-made at the Firefox site. Just unzip the set that you’ll find here, and put the resulting files in locations like the following:
I’m happy to archive more of these if anyone feels like contributing. They aren’t hard to create.
An earlier post to this blog summarizes new NEH-funded work on the problems of digitizing Latin incunabula. The project will disseminate its results very broadly, through publication of data on freely accessible sites like Perseus and in other university digital libraries, application of extremely liberal Creative Commons licenses to program code, and so forth. In taking this approach, Rydberg-Cox and his colleagues have lots of company: a strong consensus has long since formed among classicists with the greatest relevant expertise that Open Access methods represent “best practice” in our field. Experiences from a full decade of scholarly electronic publication online have demonstrated that we can now reach a huge international audience that’s eager to use the resources (texts, images, tools, analyses) we can make available concerning the ancient world.
Against that background, the recent APA decision to create a members-only portion of its web site strikes me as an obvious mistake, to the extent that the APA pushes this as a repository for additional members-only scholarly content. I believe that what the APA has done represents an unimaginative and inadequate response to the opportunities afforded us in our networked world. In the first place, the claim that limiting access to the database of members “gives current members a strong incentive to remain members” is hard to take seriously. Also, in restricting TAPA to this closed portion of the site, the members-only ploy creates a needless dichotomy between a tiny group of insiders with privileged access to information and outsiders (= the entire world) without that. Could the APA come up with any better way than this to perpetuate the ideal of Classics as a 19th century gentlemen’s club? Then too, by putting TAPA (wait, members already get that, right?) and discounts on books from OUP behind the firewall, the APA privileges the most conventional and traditional forms of scholarly communication in Classics but relegates to a lesser status the wonderful variety of innovative work being done online by many of its own members. Finally, the APA looks to be running against an accelerating trend in other disciplines, especially the natural sciences (read about the proposed new requirement for Public Access from the NIH here).
[Update: The New York Times article (14 December 2004) on new digitization projects notes that “The Google effort and others like it that are already under way, including projects by the Library of Congress to put selections of its best holdings online, are part of a trend to potentially democratize access to information that has long been available to only small, select groups of students and scholars.” Compare!]
A second update: Please don’t miss the thoughtful remarks of Alun Salt at The Undoctored Past regarding this post.
A third update: Interesting and important continuations of this discussion at Blogographos, by David Meadows and Alun Salt — not to be missed. I hope to have more to say about it all when I am done with current traveling.
PS: You’ll notice that the comments form below is turned off, simply because it’s hard to prevent blog comments sections from filling up with spam, and I don’t want to waste time clearing it out. But I’m always glad to hear people’s thoughts on this important topic. You can reach me at scaife–AT–gmail.com.
The Center for Hellenic Studies invites applications to two upcoming summer seminars on Greek scholarship and electronic publication. The first seminar (led by professors Casey Dué and Mary Ebbott) will be on “Homer: Research on Homeric Poetry, Emphasizing Textual Criticism” and the second (led by professors Kent Rigsby and Joshua Sosin) will be on “Epigraphy: Greek Inscriptions, Introductions, Methods and Research.” In between the two there will be a common session on “Online Publishing Technologies and Sharing Results.” For this session,
Christopher Blackwell and several guest lecturers will work with participants on state-of-the-art technologies and methods for publishing scholarship in electronic media. Topics will include eXtensible Markup Language (XML), the Text Encoding Initiative’s standards for marking up humanistic texts in XML, the Unicode standard for representing the characters and symbols, the Classical Text Services protocol for distributing texts and fragments of them, and transforming XML documents for print and electronic publication using eXtensible Stylesheet Language Transformations (XSLT).
The Future of Digital Media is “a two-month series, sponsored by Orb, that explores how the empowerment of the consumer over his or her media experience, coupled with technological innovation that’s broadly democratizing media creation, is leading to a revolution in the way people access, consume, share and remake content.” Now the series offers an interesting interview with Tim Wu, an associate professor at University of Virginia Law School, who teaches intellectual property and international trade.
My vision of copyright will sound conventional: I think copyright law should serve authors and consumers. But that turns out to be a radical view. Because if we took those ideals of copyright seriously, as opposed to paying them lip service, the law would look a lot different than it does today.
Congratulations to Jeff Rydberg-Cox and others at the University of Missouri at Kansas-City for winning a substantial new grant from NEH Preservation and Access. Here is the summary of the proposal; be sure to note the smart plans for open dissemination of tools and results at the end:
Addressing the Problems of Digitizing Latin Incunables
Early books printed in Latin are a major component of our early modern cultural heritage. Before 1600, considerably more than half of the books printed in England alone were printed in Latin, as were the majority of books traded at international book fairs and marketed internationally. The ability to create digital editions of these texts is, therefore, essential for preserving the greater portion of the intellectual heritage of the early modern period. Digitization of these books, however, poses unique and difficult problems: characters and ligatures are printed using graphs not represented in ASCII or Unicode, figures and pictures that are essential for understanding a passage are embedded within the texts, words that carry from one line to the next may not be hyphenated, and common words and letter combinations are abbreviated with a system of brevigraphs based on medieval handwritten manuscripts. In recent years, projects such as the Making of America have developed techniques for rapid and cost-effective digitization of large corpora of printed works from the nineteenth and twentieth centuries, while Early English Books Online and the Text Creation Partnership have addressed problems of early books printed in English. At the same time, efforts such as the Newton Project and the Digital Scriptorium have developed extensive knowledge about best practices for transcribing and cataloging manuscript material. The unique problems of Latin incunables, however, still remain to be addressed. Our project will have the following specific deliverables:
- Digital facsimile editions of a collection of incunables containing texts by Al-Qabisi, Bernard of Gordon, Sebastian Brant, Isidore of Seville, Petrarch, Pliny the Elder, Suetonius, and Jacobus de Voragine. These facsimile editions represent a wide variety of scientific and literary Latin from many time periods and geographic locations, all published in the first fifty years of printing. They will serve both as testbeds for developing our tools and also as demonstrations of the results that we will be able to achieve.
- Tools that can automatically or semi-automatically address the typographical difficulties posed by early printed works including:
- Tools for the automatic identification of abbreviations and broken words;
- Integration of these tools with a text editor allowing for interactive editing and disambiguation of uncertain abbreviations;
- A digitized edition of a dictionary essential for reading fifteenth-century Latin based on Du Cange’s standard medieval Latin dictionary;
- Extremely flexible look-up tools for dealing with the wide variety of orthographic variation in early printed Latin texts.
- Guidelines for data entry and encoding brevigraphs in early printed Latin texts that can be shared with others digitizing similar material.
Results of our work will be disseminated in several ways.
- First, we will publish high resolution images of our early printed books alongside digital transcriptions on the web using the software infrastructure developed by the Perseus project (http://www.perseus.tufts.edu/).
- Second, we will return our TEI-conformant XML transcriptions of the texts to the libraries that provide us the images of the books so that they can be disseminated via their own web sites as well as through our e-publication infrastructure.
- Third, we will release all tools for public use both via the internet and as stand-alone applications so scholars in rare books rooms without easy internet access will be able to use them.
- Fourth, the source code for our tools will be made available to any interested researcher under a Creative Commons license, allowing other scholars to adapt and extend them for their own work.
- Finally, we will conduct a detailed analysis of both our workflow and the ways that users interact with our digital texts so that we can provide a clear understanding of the technical requirements for building large digital collections of rare and complicated books. Our ultimate goal will be to produce documentation that details data entry methods and encoding standards for early printed Latin works so that librarians and scholars can digitize their own early printed holdings.
In addition to the usual round-up of news from the past month, it takes a close look at the Congressional approval of the NIH public access plan and the UK government response to the open-access recommendations from the House of Commons Science and Technology Committee. Among the news stories given shorter takes are a series of national OA initiatives launched in November, the Kaufman-Wills study of open-access journals, and Google Scholar.
Wired has a short article about topic maps; here’s an excerpt:
Databases and search engines provide instantaneous access to endless information about anyone or anything, but the search results often include as many misses as hits. To generate more-relevant answers, organizations including the federal government are using topic maps to index their data.
Topic maps are smart indices that improve search capabilities by categorizing terms based on their relationships with other things. For example, William Shakespeare is a topic that would be mapped to essays about him, his plays and his famous quotes.
Organizing content with topic maps provides context for words that can have multiple meanings, according to Patrick Durusau, chairman of a topic maps technical committee at OASIS, the Organization for the Advancement of Structured Information Standards.
For example, searching Google for “Franz Ferdinand” mixes results for the alternate rock group and the doomed Austrian archduke for whom the group is named. If topic maps were used to organize the data, the musical and historical links would be separated, Durusau said. “The payoff (of topic maps) from the user standpoint is that you are no longer confronted with everything in the world that is known about the subject,” Durusau said.
A good deal more on this in Steve Pepper, “The Tao of Topic Maps.”