Google and the historian

Dan Cohen gave an inter­est­ing talk at the American Historical Association meet­ing recently, where he dis­cussed the ben­e­fits Google brings to his­tor­i­cal research, as well as some pointed criticisms.

Compare Google to other com­pa­nies, like ProQuest or Elsevier. These two (among other com­pa­nies) charge “exor­bi­tant” fees to libraries for access to research mate­ri­als. I think any­one who has ever worked in a library would agree that the costs of access are frus­trat­ing and increas­ingly impos­si­ble, and take a larger and larger chunk of library resources, even as library bud­gets are shrink­ing. Negotiating with them is an ongo­ing chal­lenge, and the tools they pro­vide — while pow­er­ful — are nowhere near the level mod­ern tech­nolo­gies should allow. Contrast this with Google, which “has given us Google Scholar, Google Books, news­pa­per archives, and more, often best­ing com­mer­cial offer­ings while being freely accessible.”

Google Books has rev­o­lu­tion­ized the way many stu­dents and pro­fes­sors approach his­tor­i­cal research. The size of one’s local library is no longer a lim­i­ta­tion to the kind of research work one can do. I am no longer depen­dent exclu­sively inter­li­brary loan to get access to books my uni­ver­sity lacks. Even if I even­tu­ally I want to actual, phys­i­cal book, with Google Books I can see if it will be use­ful before I waste the time  (or the very lim­ited funds I have cur­rently to buy it myself).

Cohen also points out, how­ever, that for all the util­ity of the ser­vice, Google “remains strangely closed when it comes to Google Books.” Cohen writes, “The real prob­lem — espe­cially for those in the dig­i­tal human­i­ties but increas­ingly for many oth­ers — is that Google Books is only open in the read-​​a-​​book-​​in-​​my-​​pajamas way.” Google has cho­sen not to max­i­mize access to public-​​domain books, or aban­doned books. To do so would poten­tially rev­o­lu­tion­ize the entire sphere of intel­lec­tual prop­erty and the pub­lish­ing indus­try — the kind of rev­o­lu­tion Google is famous for in other spheres, but which it has not cho­sen to push now. The cur­rent set­tle­ment may indeed be prob­lem­atic, but it is not rev­o­lu­tion­ary. Cohen notes:

We should remem­ber that the rea­son we are in a set­tle­ment now is that Google didn’t have enough chutz­pah to take the higher, tougher road — a direct chal­lenge in the courts, the court of pub­lic opin­ion, or the Congress to the intel­lec­tual prop­erty régime that gov­erns many books and makes them dif­fi­cult to bring online, even though their authors and pub­lish­ers are long gone. While Google reg­u­larly uses its power to alter mar­kets rad­i­cally, it has been unchar­ac­ter­is­ti­cally meek in attack­ing head-​​on this intel­lec­tual prop­erty tower and its pow­er­ful cor­po­rate defend­ers. Had Google taken a stronger stance, his­to­ri­ans would have likely been fully behind their efforts, since we too face the annoy­ances that unbal­anced copy­right law places on our ped­a­gog­i­cal and schol­arly use of tex­tual, visual, audio, and video evidence.

via Dan Cohen’s Digital Humanities Blog » Blog Archive » Is Google Good for History?.

Much as I would have liked to see the IP régime change and to see Google lead­ing the effort, per­haps such an attempt is unre­al­is­tic. Google under­stands Web data. It’s engi­neers under­stand elec­tronic sources, hyper­links, soft­ware, and PDFs. Their approaches and algo­rithms have rev­o­lu­tion­ized Web search­ing. But the peo­ple at Google have less of an under­stand­ing of the kind of research and writ­ing done in the human­i­ties, the books his­to­ri­ans write, and the arti­cles and research we pro­duce. Cohen writes:

Because Google Books is the prod­uct of engi­neers, with tremen­dous tal­ent in com­puter sci­ence but less sense of the his­tory of the book or the book as an object rather than bits, it founders in many respects. Google still has no decent sense of how to rank search results in human­i­ties cor­pora. Bibliometrics and text min­ing work poorly on these sources (as opposed to, say, the highly struc­tured sci­en­tific papers Google Scholar spe­cial­izes in). Studying how pro­fes­sional his­to­ri­ans rank and sort pri­mary and sec­ondary sources might tell Google a lot, which it could use in turn to help scholars.

Google has man­aged to move into new areas before, from search to build­ing hard­ware and soft­ware (the Nexus One), for exam­ple. Why couldn’t they learn from the human­i­ties and not just from other engi­neers? Advertising, after all, is already a com­bi­na­tion of engi­neer­ing, human­i­ties, and busi­ness — so why couldn’t Google devel­op­ers learn from his­tory schol­ars to improve their search algo­rithms for Google Scholar and Google Books?

Related articles
  1. Image credit: "Box of type" from the Edinburgh City of Print on Flickr, used under a Creative Commons Attribution 2.0 license.