HathiTrust Digital Library Adds New Content Accessible to All Users, Nearly 100,000 Volumes Digitized by Internet Archive
July 15, 2010 | 11:04 am
By Paul Biba
From Resource Shelf:
Cool! Visualize the Collection by Call Number, Languages, and Date
The Hathi Trust Digital Library Update for June 2010 that was released today has info on a couple of new features. We’ve bolded the items that are available to all users.
Shibboleth was released for partner authentication in June.
Authenticated users can now download full-PDFs of all public domain volumes in HathiTrust, and access the Collection Builder feature through local sign-on. Shibboleth also lays the groundwork for future augmented services to partner institutions, potentially including the ability to make uses of digital volumes allowed by Section 108 of U.S. copyright law, and allow full access to in copyright volumes for users with print disabilities.
The release of Shibboleth was made in conjunction with improvements to PageTurner that enabled delivery of high-resolution PDF files with embedded OCR for entire volumes. While only individuals at member institutions have access to this service across the repository, all public domain volumes that were not digitized by Google are available for full-PDF download to members and non-members alike. Right now these include nearly 100,000 Internet Archive-digitized volumes that have been contributed by the University of California, and thousands of volumes digitized locally by the University of Michigan. The partners are poised to significantly increase the amount of non-Google-digitized content preserved in HathiTrust in the near future, making many more public domain volumes freely available for download and distribution.
While we’re talking user tools, let’s step away from the HTDL Update for a moment and check a recent post on Hathi’s Large Scale Search Blog by Tom Burton-West. He writes:
When you do a search, you will see check boxes next to each search result. You can select items you want from the search results and create a personal collection. This should make it much easier to do repeated searches and explore a targeted subset of the HathiTrust volumes. If you are not logged in, the collection will be temporary. If you log in you can save the collection permanently. This enables users to do focused searching within a selected subset of search results.
If you’re not from a partner institution, follow the links on this page to create a “friend” account from the University of Michigan. All you need is an email address and access to your email. It takes no more than 3-5 minutes.
That’s it. Now, back to highlights from the newsletter.
SEASR
HathiTrust is in the process of investigating SEASR, the Software Environment for the Advancement of Scholarly Research, as a means to provide computational access to materials stored in the repository. Staff at the University of Michigan began installation of SEASR in the HathiTrust development environment in June, and expect to gain more knowledge about SEASR and what would be involved in applying it to HathiTrust over the next several weeks.
Next, Highlights from Hathi Working Groups
Discovery
As of the end of June, there are nearly 3.1 million HathiTrust records in WorldCat. Record loading is now continuing at a quicker pace, and is nearly complete.
OCLC is also making several alterations to the catalog’s functionality to fully meet HathiTrust’s requirements. This work is expected to extend into early August, after which time the interface will be reviewed for public beta release.
Collaborative Development Environment
University of Michigan staff continued the migration of HathiTrust applications into the new development environment in June, performing testing and configuration of the GlusterFS distributed file system that will be used as the storage back-end for the environment as well…When configuration is complete, the environment will support HathiTrust development efforts broadly across the partnership.
Quality, Ingest, and Error Rate
The quality working group is still working through a set of scenarios for gating volumes of poor quality from entering HathiTrust, and developing a justification and recommendation for the best approach to follow.
Development Updates come next.
Large-scale Search
The full text search index in Indiana was put into production by Michigan staff in early June, making the infrastructure for full text search fully redundant. Two new index build servers were also put into production in Michigan. All of the new systems have been functioning well, and the new build servers have substantially improved the performance of index building and maintenance…Michigan staff also developed a Lucene utility in June (Solr uses Lucene) to read an index and print out the total number of occurrences of a term.
Collection Builder
Integration of Collection Builder functionality with large-scale search is in the final stages of testing and will be deployed in July.
Storage Upgrade
Michigan staff have ordered and received additional storage for the Indiana and Michigan sites and will be putting it into service during July and August. The upgrade requires the installation of a new, larger storage network switch, so staff will be using the opportunity to introduce a new cabling layout for the entire system.
Outages
HathiTrust services were unavailable on Monday, June 7 from 7:10-10:00am and on Tuesday, June 8 from 5:00-5:30pm due to a connectivity problem with one of the web servers; and on Saturday, June 25 from 8:30-10:00am due to a database server disk space shortage.
Database Growth
Indiana University
236 Volumes Added in June
177,333 Total Volumes in Collection
Penn State University
328 Volumes Added in June
22,824 Total Volumes in Collection
University of California
616 Volumes Added in June
1,509,169 Total Volumes in Collection
University of Michigan
34,605 Volumes Added in June
4,056,835 Total Volumes in Collection
University of Minnesota
173 Volumes Added in June
73,856 Total Volumes in Collection
University of Wisconsin
10,073 Volumes Added in June
353,639 Total Volumes in Collection
Totals
46,031 Volumes Added in June
6,193,386 Total Volumes in Collection
Public Domain Volumes
~20% of Total
1,208,351 Total Volumes in Collection
Statistics
6,197,125 total volumes
3,627,903 book titles
146,794 serial titles
2,168,993,750 pages
230 terabytes
73 miles
5,035 tons
1,208,634 volumes (~20% of total) in the public domain
The HathiTrust Update is Also Available as a PDF.
Access HathiTrust Digital Library Search and Collection Tools



Previous

SUBSCRIBE TO RSS