Alex Wright


Search inside

October 23, 2003

Nadav points out Amazon's new "Search Inside" feature, whereby you can now do full-text searching within selected books in their catalog. Nifty.

Nadav also asks the entirely reasonable question of why the Library of Congress isn't already doing this? As I understand it (from library school days), there are two reasons:

  • With a collection numbering over 100 million volumes, it would be staggeringly expensive. Amazon's full-text collection, at c.100,000 volumes, is about 0.1% the size of the Library of Congress' collection - and comprised of books that presumably all originated in softcopy. Scanning and OCR-ing 100 million physical volumes would cost at least tens of billions of dollars, and could take decades.

  • As importantly, the process of scanning and digitizing a book quite often destroys the physical artifact, especially older books. Rare books librarians would be horrified at the prospect; and many hard-core librarians believe that the physical artifact can often tell us as much about a book's cultural context as the contents inside.

Which is not to say that digitizing books is a bad idea; just a more complicated proposition than it might seem.

It's worth noting that the Library of Congress has made a few limited strides towards digitizing its collection in recent years. Worth a look: American Memories


File under: User Experience

_____________________
« That's our gov | for Kwong-roshi »

 

Glut: Mastering Information Through the Ages

GLUT:
Mastering Information Through the Ages

New Paperback Edition

“A penetrating and highly entertaining meditation on the information age and its historical roots.”
—Los Angeles Times     

Buy from Amazon.com