Requirements for mass digitization projects

December 28th, 2006 by Ross Scaife

Joseph J. Esposito has a post on the liblicense-l list at Yale wherein he makes some points worth bearing in mind:

My concern is a practical one: Some projects are incomplete in their design, which will likely result in their having to be redone in the near future, an expense that the world of scholarly communications can ill afford. There are at least four essential characteristics of any such project, and there may very well be more.

As many have noted, the first requirement of such a project is that it adopt an archival approach. Some scanning is now being done with little regard for preserving the entire informational context of the original…

Archives of digital facsimiles are important, but we also need readers’ editions, the second requirement of mass digitization projects. This goes beyond scanning and involves the editorial process that is usually associated with the publishing industry. The point is not simply to preserve the cultural legacy but to make it more available to scholars, students, and interested laypeople…

As devotees of “Web 2.0″ insist with increasing frequency, all documents are in some sense community documents. Thus scanned and edited material must be placed into a technical environment that enables ongoing annotation and commentary. The supplemental commentary may in time be of greater importance than the initial or “founding” document itself, and some comments may themselves become seminal…

The fourth requirement is that mass digitization projects should yield file structures and tools that allow for machine process to work with the content. Whether this is called “pattern recognition” or “data mining” or something else is not important. What is important is to recognize that the world of research increasingly will be populated by robots, a term that no longer can or should carry a negative connotation. Some people call this “Web 3.0″, but I prefer to think of it as “the post-human Internet,” which may not even be a World Wide Web application.

To my knowledge, none of the current mass digitization projects fully incorporate all four of these requirements.

Leave a Reply