Home > Digital Editing, Digital Humanities, Publishing > Etext Center vs. Google Books

Etext Center vs. Google Books

This post will be much briefer than my previous reflections on the state of digital editing. During the course of that post, I made casual reference to the Google Book initiative that clearly expressed some dissatisfaction and now feel that comment needs further elaboration. While others have offered more substantial critiques of this (admittedly useful) service, my perspective on the situation is perhaps worth a few lines.

From 2003 through 2007, I was a graduate research assistant at the University of Virginia’s Electronic Text Center (Etext Center) and worked extensively on the digitization of our library’s (out-of-copyright) volumes using TEI SGML (and, eventually, XML). Almost all my coworkers, the largest percentage of whom were graduates students in the English department, had learned the art of textual markup on the job and this was by design and not accident. David Seaman, the original director of the Etext Center and also one-time graduate student of English, strongly believed that it was easier to teach someone with a strong humanities background the necessary technical skills to do these tasks than to teach someone from a computer science background how to understand humanities research practices.

My experience since then has largely validated David’s theory, although there are doubtless numerous computer science experts who would be more than capable of understanding what the humanities want from digital resources. Whether there are an equal number who can be bothered to acquire that understanding, though, is an entirely different matter. More often than not, programmers have trouble viewing what humanities scholars do as ‘real’ research because it resists the easy quantification found throughout most hard sciences – Personally, I cannot help but wonder why this isn’t seen more often as a challenge to the logical underpinnings of computer science and an opportunity for innovation rather than an indictment of the validity of humanities research.

In the end, this same lack of respect and understanding for the concerns of humanities research is the flaw behind many of shortcomings in Google books. Since Google’s digitization and metadata management decisions have been made primarily by those with a computer science background, they have often failed to anticipate obvious problems in their methodology. No doubt the company’s defenders will point out that librarians and those with similar backgrounds have been consulted extensively during Google books’ development, but there’s a difference between a project managed by humanists and one that only solicits the advice of humanists.

Consider the issue of older works existing in multiple editions (or, at the risk of causing some engineer’s head to explode, multiple printings within a single edition created prior to electronic type-setting, each with minor corrections that have not been systematically recorded). Faced with these circumstances, even the most inexperienced student of textual criticism would realize the importance of carefully recording detailed bibliographic information for each digitized text and the desirability of having access to alternative versions of the work in question. Indeed, they would recognize the distinction between “text” and “work” that is at play in my last sentence. Those in charge of Google’s project, however, clearly don’t appreciate such issues – either being completely unaware of them, or calculating that the cost of such diligence would outweigh the benefit given their intended audience.

I promised to be brief, so let me finish here by admitting again that Google is creating a very useful resource – however, it is not a scholarly resource in its current incarnation (just as the recent interest in digital distribution among publishers mentioned in my last posting has nothing to do with digital editing). Without an appreciation of those issues that matter to humanities researchers, especially textual critics given the digital content in question, it can really only be useful for the most basic queries for which “any old copy” of a work will do. This situation was no doubt inevitable given the scale of Google’s ambitions, but that doesn’t make it less unfortunate. Google books, in my opinion, proves David Seaman’s theory about who should be the driving force in the digital humanities on a grand scale.

N.B. Just a final note about Google Books only tangentially related to my main point here: While browsing texts referencing Sir Gawain and the Green Knight in their collection, I came across A. C. Spearing’s Criticism and Medieval Poetry. Google claims the book was published in 1873, despite the fact that Prof. Spearing seemed rather too spry when I sat in his class a decade ago to have published anything in the nineteenth century. What was the impetus for this obvious error? Barnes & Noble’s declaration of their founding date on the title page of Spearing’s book (check the image here, assuming it has not yet been removed). It makes you wonder how many other “1873” B&N publications are knocking about on Google Books.

  1. No comments yet.
  1. No trackbacks yet.