Digital Papyrology

October 26th, 2010 by Joshua Sosin

The following is a lightly edited version of a talk that I delivered at the 26th Congress of the International Association of Papyrologists, 19 August 2010, in Geneva (program), posted here upon nudging of G. Bodard.

Colleagues. It is a great honor and a privilege to be able to speak with you today. An honor and a privilege that, I hasten to add, I did not seek, but which a number of our colleagues insisted some months back the members of this research team must try to live up to. If I approach this distinguished body with some trepidation it is perhaps because my training as an epigraphist has conditioned me to a tone less attuned to collegiality than that which informs the papyrologists’ discipline. I should add also that am here not to present my own work, but the fruits of a team whose members are in Heidelberg, London, New York, North Carolina, Alabama, and Kentucky, and who have been working heroically for more than three years now.

I shall aim to speak for no more than 40 minutes so that we may at least start discussions, which I know the rest of the team and I will be more than happy to carry on via email, Skype, phone, and separate face to face meetings. I will add also that, since the matters arising from this talk are highly technical in nature, we shall be more than happy to field questions as a team (I and my colleagues Rodney Ast, James Cowey, Tom Elliott, and Paul Heilporn) and in any of the languages within our competence.

First some background. I don’t need to tell you very much about the history of the Duke Data Bank of Documentary Papyri. It was founded in 1983, as a collaboration between William Willis and John Oates of Duke University, and the Packard Humanities Institute. A decade and a half later, around the time, as it happens, that APIS was also starting, the DDbDP decided to migrate from the old CD platform and to the web. John in particular was committed to making the data available for free, to anyone who wanted access. The Perseus Project, from Tufts University, very kindly agreed to host the new online DDbDP, to develop a search interface, to convert the data from old Beta code to a markup language called SGML–all at no cost to us. The DDbDP added a few thousand texts after switching from the Packard CD ROM to Perseus. But the landscape changed dramatically from this point onward, and the DDbDP began to fall behind. The end of the CD ROM meant the end of regular revenues to support data entry and proofreading. And of course, ongoing development of the search interface was not without cost to Perseus, whose generous efforts on our behalf were, as I mention, unremunerated. Within a few years the DDbDP was behind in data entry and the search interface was not able to grow and mature in the ways that papyrologists wanted.

So, when I returned to Duke in 2003/4, I began a process meant to fix this problem. In 2005 The Andrew W. Mellon Foundation awarded us a modest grant to discuss possible solutions with small groups of papyrologists and technologists. Shortly thereafter James Cowey and I began mapping the HGV and DDbDP to each other, with a view to creating the possibility of technical integration of the two databases. In 2006 we began working on a proposal to Mellon to implement the results of those earlier discussions and the new collaboration with HGV. Now, in the meantime, a separate initiative, under the leadership of Roger Bagnall, had begun to develop the Papyrological Navigator, as a tool for searching and browsing the DDbDP, HGV, and APIS. Mellon very generously supported our proposal and we spent 2007/8 (1) converting the DDbDP from the by-then outdated and crude SGML to an open and transparent markup standard known as EpiDoc—devised for use with inscriptions, but easily extensible to cover papyrologists’ needs, (2) creating a technical framework for assembling HGV records with the corresponding Duke texts and, where they exist, APIS records—I want to say here, that absolutely critical task was made much easier by the very generous collaboration of Mark Depauw and the entire Trismegistos team, whose TM numbers are in an important sense the glue that holds our software together!, and whose collaborative spirit is, I think, a model to be emulated, and (3) finish the work started on the Papyrological Navigator.

We had begun to solve the puzzle. The DDbDP had a powerful search interface. HGV and DDbDP were on a path to technical integration; APIS records could now be displayed alongside Greek texts; Greek texts could be displayed alongside images. So there was progress. But we had not solved all of the puzzle. The DDbDP was slipping farther and farther behind on data entry, and local conditions made it increasingly difficult for me to hire graduate students to enter texts. So, how were we to solve this problem? We proposed to build an online environment that would allow the worldwide community of scholars to enter texts into the DDbDP. The system would allow them to, in effect, paste in Word files, make some alterations, and then submit the texts to a board of editors who would proofread them and then push the texts into the database.

This began as an economic problem: we could not afford to pay for data entry. But once we started thinking about a new group-based platform a whole new vista of ideas and questions opened up to us. Why should Duke be the sole authority of what goes into the databank? Couldn’t the community do this? Why should the DDbDP only reflect scholarship that had already appeared in print publications? Could it become a forum in which, for example, emendations are proposed, discussed, approved by Editors just as, say, journal articles are? Wikis had already exploded onto the scene, the sciences had already begun to implement group-based strategies for management of scholarly data, and it was very clear to me that the status quo could only end in ruin.

And so, we proposed to the Mellon Foundation to, among other things, build a web-based platform that allows users to add texts to the DDbDP, correct typos, add or change translations, propose additions or emendations to HGV records, add emendations found in the BL or in other publications, even propose emendations directly to the databank, so that control of this central scholarly dataset would grow to reside with the community, rather than with me, so that oversight was more democratic, and less hegemonic. Mellon generously funded the project and we spent 2008 to 2010 building it. We have tested it with small groups of colleagues and today, here, we will unveil officially.

Before I show you the software I want to tell you about its name. When we first started talking about the new editor we had much fruitful conversation with a dear colleague, Ross Scaife. Ross had pioneered back in 1998 an online translation of the Suda, the tenth century encyclopedia. He figured that a translation of the entire Suda was highly desirable but that current trends in academia meant that it was very unlikely any one person would sit down and translate the whole thing—almost certainly not in North America anyway. So, he led a group in the creation of the Suda On Line (SOL), which allowed users to propose translations of Suda entries, submit the proposed translations to editorial review, and ultimately publish them online. This was a groundbreaking project and exactly the sort of thing we wanted to do with the larger and vastly more complicated data in the DDbDP. As we began to plan the next generation software based on the idea of the Suda On Line, our beloved colleague died tragically. And so, partly as tribute to Ross’s brilliant idea for group-translation of the Suda, and partly because the acronym was nearly unique, we named our editing platform the “Son of Suda On Line” or SoSOL. This is the age of acronyms and we are as guilty as anyone else.

So, first I am going to walk you through a few of the software’s capabilities, and then I’ll tell you about some of the things that we hope to build in the next phase of development.

Reader beware: Here, I ran a live demo of the software; demo cues are not translated as actual static links

OK, so first I will log in to my account. This opens my dashboard, where you can see new texts that I have created, with temporary ID numbers assigned by the system [mouse over New Publications], some texts that I am working on at the moment [mouse over Editing], texts that I have submitted to the editorial board for review already [mouse over Submitted], and texts that the editorial board has already approved for inclusion in the DDbDP [mouse over Committed]. I mention here the Editorial Board. This is a group of 6 papyrologists (we hope to raise that number to around ten) who have volunteered to review texts as they are entered into the system, proofread them against the print edition, and finalize them for inclusion in the DDbDP. We hope that this will become a rotating duty, like editorship of a journal, for which colleagues will want to volunteer. For now, the Editorial Board is made up of Rodney Ast, James Cowey, Paul Heilporn, Todd Hickey, Cisca Hoogendijk, and myself.

There will also be a board of Senior Editors, to whom proposed emendations and especially puzzling problems in texts will be referred, again, on the model of journal referees.

Since I am on the Editorial Board you can see the list of new texts that I and the other editors need to vote on before they can be added to the DDbDP [mouse over Voting]. A few weeks ago we invited a small group of scholars to test out the software by entering new texts into the DDbDP—in 2 days they entered more than 250 texts! Let’s look at one of them. So, here is O.Krok. 1.93, a short text entered by our colleague George Bevan. You can see that a link here takes us to the HGV metadata record [mouse over HGV icon]; and a link here takes us to the DDbDP text, which has been assigned a temporary ID number until the editors approve it for inclusion in the DDbDP [mouse over DDbDP icon]; in this case SoSOL 2010-550, the 550th record created in 2010. When I click on that link you see we are brought to a view of the text [Click on DDbDP link].

You see here that I have already voted to accept the text and if you want to see my comments, you can click Overview [click Overview], and see that I noted some typos. You see also the message that George left when he submitted the text. But let’s look at the Greek [click to Leiden+]. A number of things leap out at you. Some things look normal: at the end of line 7 the square brackets and underdot in ἐ̣[νθα] are exactly what you are used to. But in other places you see that the new system requires a few slightly different conventions. The hyphen in enthade appears in the following line (line 8), rather than the end of line 7; at the start of line 9 we see “.3” instead of three underdots; in line 2 χα(ίρειν) has extra parentheses around it: (χα(ίρειν)). There are other strange bits as well: note in line 7 the string <:φοβοῦμαι|orth|φωβοῦμε:>; which indicates that φοβοῦμαι is the orthographically ‘normalized’ reading and φωβοῦμε is the accented version of what the scribe wrote. Once you understand the logic there, it is pretty straightforward. We use the same pattern for indicating BL corrections, alternative readings, etc.

We’ll look at these new conventions in a minute. But first: Why do we have to enter texts in this funny way? Well, we need to be able to generate the following HTML representation of the text in the Papyrological Navigator [click Preview]. But in order to display that nice Greek, we need to enter it in a markup language called EpiDoc XML, which looks like this [click XML]. Now, unless you are a robot, you do not want to enter all of that XML encoding; it is essential but people do not want to type it. So, we have invented a kind of shorthand, called Leiden+ [click back to Leiden+], since it is basically the Leiden conventions, plus some extras; square brackets, double square brackets, angle brackets, braces (or curly brackets), are basically the same; parentheses for abbreviations are basically the same. SoSOL lets you enter your text in Leiden+ and translates it into XML for you. So, there is a little bit of extra work for us to do, but it isn’t too hard.

To prove it, we’ll enter a new text for you right here. Let’s take P.Sijp. 41a. We’ll go back to my home page and create the text [click Home]. Now, this text is already in HGV, but not in the DDbDP. So, I want to Emend an Existing Publication. So, I select HGV then P.Sijp. then no volume, then document 41_a [Enter info for Emend Existing]. This brings me to an overview page that tells me that there is an HGV record, but no text and no translation. Note that it also alerts me that Paul Heilporn is also editing this document; I can email him to make sure that we are not stepping on each other’s feet. So, I create a new text, which opens an empty template for me [click Create]. Then I paste in my text and start to make the changes from the print conventions to Leiden+

Here is the version used to generate the print edition:

[ἔτους α (?) Αὐτοκράτορος]   ̣ ̣[  ̣]  ̣  ̣του
[          ± 12          ] Σεβαστοῦ
[εἴργ(ασται) ὑ(πὲρ) χω(ματικῶν) ἔ]ργ(ων) τοῦ αὐτοῦ̣ πρώτου (ἔτους)
4[  (month)  ] κ  κς ἐ[ν] τῇ Ἐπα-
[γαθιαν]ῇ διώ(ρυγι) Βακχιά(δος)
[  NN ] Πατκ(όννεως) τοῦ Θεαγένους
[      ± 6       ] μη(τρὸς) Ταύρεως
8[ NN ] (m. 2) σεση(μείωμαι)

Now, since I already know what the differences between Leiden and Leiden+ are, I’ve gone ahead and made most of the simple changes in the Word file itself, before pasting it into SoSOL.

Here’s what the same text looks like in Leiden+

1. [ἔτους] [<#α=1#> (?)] [Αὐτοκράτορος] .2[.1].2του
2. [ca.12] Σεβαστοῦ
3. [(εἴργ(ασται)) (ὑ(πὲρ) χω(ματικῶν))] ([ἔ]ργ(ων)) τοῦ αὐτοῦ̣ πρώτου ((ἔτους))
4. [.?] <#κ=20#>  <#κϛ=26#> ἐ[ν] τῇ Ἐπα
5.- [γαθιαν]ῇ (διώ(ρυγι)) (Βακχιά(δος))
6. [.?] (Πατκ(όννεως)) τοῦ Θεαγένους
7. [ca.6] (μη(τρὸς)) Ταύρεως
8. [.?] $m2 (σεση(μείωμαι))

So, you see, Leiden+ lets me indicate in line 1 that the uncertainty in the restoration applied to the year number only; it allows me to enter the number in Greek and also encode its value; κϛ is 26. Lacunas are pretty straightforward. Abbreviations require extra parentheses. But all is pretty simple. But, imagine that at line 6 the scribe first wrote τησ and then corrected it to του; and let’s say I don’t know how to indicate that in Leiden+. I just click Leiden+ Help [click Leiden+ Help], and scroll to the entry for Apparatus, Scribal Correction; I see that the Leiden+ convention is <:τοῦ|subst|της:> [mouse over], which means that the scribe wrote της and then corrected it to του. I can find here a description of the conventions for BL corrections, or deletions, or numbers, or just about anything. And if I don’t like typing all those angle brackets and things, I can use the Helpers to insert them for me [click Helper]; so, if the scribe wrote τησ and WE correct it to τοῦ, I select “Orthographic correction” and insert τοῦ as the “Accepted” reading and τῆς as the “Rejected” reading [click Orthographic] / [insert τοῦ / τῆς] / [click Enter]. And the Helper inserts it for me.

Once I have entered my text, I go to the Overview page, enter a few words explaining that I have entered the text from a digital copy of the file and submit it to the Editorial Board for review [Paste “Entered text from digital copy of P.Sijp.; proofread against print edition”.]. The Board votes on the text, performs a few additional steps, which I don’t need to show you, and then send it to the main database. It then takes a day or even a couple before the text is visible in the Papyrological Navigator, but it will appear. So, for example, you can see here in the test version of the new Papyrological Navigator, if you navigate to P.Oxy. 69, you will see nearly all of the texts from that volume, which our colleague Nick Gonis entered himself.

[navigate to PN development site]

The new Papyrological Navigator is still in testing; new texts do not appear in the PN that you know and use already.

Now, if I wanted to log into an existing text and correct a typo, enter an emendation that is in the BL but not yet in the DDbDP, or a correction that is not yet in the BL, or even propose and justify an emendation that has not appeared anywhere in print, I would simply follow the same steps that I showed you here. It is a little bit daunting at first, but once you use it a few times, it becomes very easy.

Most of our time at the moment is spent catching up on data entry. If you would like to contribute, any one of you can request access to a Google document, where you can claim papyri as you like. This helps us to avoid duplication of effort.

Now, I could spend another 3 hours walking you through all of the capabilities of SoSOL, but instead I want to mention one important guiding principle of the system, say a few words about what we hope to do next, and then open the floor to questions.

Permanent transparency is the guiding principle behind SoSOL. The system keeps track of everything. When you log in and submit a text, SoSOL records it; when you submit a text or propose an emendation SoSOL will not let you submit until you have written a message explaining what you propose. Similarly, SoSOL will not allow Editors to vote on a text without explaining why they vote the way they do. For every single text SoSOL keeps a permanent and comprehensive record of every single change. Users can see this, forever. The discipline of transparency and permanence has the virtue of requiring all of us to live up to the high standards of our field’s motto, and make that motto meaningful: amicitia papyrologorum. Collegiality is, in effect, a technical requirement of SoSOL. It also means that all proposals must be offered and scrutinized with utmost seriousness, since our comments are visible to all, forever. And, that under SoSOL accurate scholarly attribution is very easy to enforce. Moreover, we assume that even suggestions judged by the Editors to be incorrect might one day be judged right, in the light of new finds, or might, though wrong, nevertheless inspire someone else to solve even an unrelated puzzle. So, SoSOL does not throw away rejected ideas; it simply stores them in the Comments page for every text, accurately attributing and time-stamping every single comment, for posterity, and for purposes of rigorous scholarly attribution.

What do we plan to do next? We have just submitted a proposal for a third round of development in which we hope to extend the capabilities of the Papyrological Navigator considerably, including much more powerful searching of HGV and APIS data along with text searchs, and a number of other improvements as well. As for SoSOL we will add features that will accommodate scholarly comments in both line-by-line commentary and introductory material; we plan also to revive the Checklist of Editions by building it into the SoSOL framework, so that it can be more easily kept up to date; we will also run a pilot project with our colleague Isabella Andorlini, who will enter 250 literary papyri (unattributed medical texts) via SoSOL, so that we may see how easily the Papyrological Navigator could be made to accommodate non-documentary texts; we will also be running multiple training sessions in Europe, where we will invite 2 to 3 dozen scholars to spend a few days learning how to use the new software; we will also significantly improve the instruction manual for SoSOL, a bit of which you saw today; we also will pursue collaborations with Trismegistos, Sammelbuch, Berichtigungsliste, and colleagues in Berlin. We have an ambitious agenda and if we are lucky enough to be funded, we will circulate a link to the proposal via PapyList, as we have done with the previous two grants.

I’ll stop now and simply conclude by saying that this is an exciting and somewhat scary new step for the field. We do not really know what the future of digital papyrology holds. But if we want to move ahead intelligently and carefully, I think there are a few measures that we can take. Especially in an age of flagging institutional support: We must collaborate. We must share the workload. We must use common technical standards. We must do our work in the full sunlight of the web, and not in the black box of anonymity. We must leverage the strength of our community’s distinguishing spirit of collegiality. And when I think of the papyrologists whom I know, of the support extended to this and other papyrological projects by multiple universities and funding agencies, of the variety of excellent technological projects on display here at the congress, I think there is cause not only for optimism, but for excitement too. I hope that you will join in that sentiment, and I thank your for your kindness and patience.

2 Responses to “Digital Papyrology”

  1. Gabriel Bodard Says:

    For anyone interested, I’ve recently been collating a bit of bibliography on the SoSOL (and the earlier SOL) tools:
    * Suda Online
    o Ross Scaife et. al, 1998-2010, Suda On Line: Byzantine Lexicography. Available:
    o Anne Mahoney, 2009, ‘Tachypaedeia’. Digital Humanities Quarterly 3.1. Available:
    * Son of SOL:
    o John Oates, 1993, ‘The Duke Databank of Documentary Papyri’, in ed. Jon Solomon, Accessing Antiquity: The computerization of Classical Studies (University of Arizona Press), pp. 62-72.
    o Ross Scaife and Dot Porter, 2006, ‘Tools for Collaborative Editing’, Open Source Critical Editions seminar paper, available
    o Dot Porter, 2008, ‘The Son of Suda Online: A next-generation collaborative editing tool’, Digital Classicist seminar, June 20, 2008. Available:
    o Ryan Baumann, 2010, ‘SoSOL Overview’, Integrating Digital Papyrology wiki. Available:
    o Joshua Sosin, 2010, ‘Digital Papyrology’, Congress of the International Association of Papyrologists, 19 August 2010, Geneva. Available:

    (Further suggestions welcome.)

  2. Gabriel Bodard Says:

    Now see also:

    * Ryan F. Baumann, ‘The Son of Suda On-Line’ forthcoming 2013 in (edd. S. Mahony & S. Dunn), Digital Classicist, Bulletin of the Institute of Classical Studies Supplement. (Prepress draft:

Leave a Reply