UIUC Library, I-CHASS and XSEDE Collaborate on Supercomputing Award

Posted on Apr 17, 2012 | No Comments

The University of Illinois Urbana-Champaign Library in collaboration with the Institute for Computing in Humanities, Arts and Social Science (I-CHASS) at the University of Illinois at Urbana-Champaign was awarded a 30,000 Service Units (SU) start-up allocation grant on the Extreme Science and Engineering Discovery Environment (XSEDE) Blacklight shared memory system at the Pittsburgh Supercomputing Center (PSC). In addition to the award of supercomputing time, the project was allocated 12 months of XSEDE technical support to assist them with optimizing their code and visualizing the results. The project team of principal investigator Harriett Green, English and Digital Humanities Librarian and assistant professor of library administration; Kirk Hess, Digital Humanities Specialist; and Economics Ph.D. candidate Richard Hislop also will be supported by XSEDE database experts and other consultants.

The project, entitled “Bandits and Browsing: Data Mining and Network Analysis for Library Collections,” will build a scalable system for library collection analysis and recommender system development. Based on the data analyses resulting from this project, the team would begin development on an enhanced recommender system for library catalogs and digital libraries that retrieves richer search results from a library collection search based on network analysis of subject relevancy, circulation data of items, and usage data for items that share interrelated subjects. In order to build this test bed for algorithm and functionalities in the recommender system, the project will utilize the advanced computing resources of XSEDE to develop self-optimizing search algorithms and network analyses that would run against the bibliographic and catalog data in library catalogs and digital library indexes.

The project team created initial prototypes of search algorithms, topic analyses, and network analyses using the English literature collection’s 40,000-item sample set. A core algorithm was initially developed to identify items that are infrequently used, yet have a high degree of topical relevance to other heavily used works in a collection. Based on these and other analyses conducted on the sample set of data, the team will expand the scalability of the search algorithms and network analyses against a full 22 million-item subset of the University of Illinois Library catalog data using the advanced computing resources of XSEDE. The team will run search and indexing algorithms against the entire subset of Library catalog records, build network graphs for subject correlations, and do full analyses for item relevancy.

“We anticipate that these analyses will enable the initial development of a recommender system for library catalogs and digital libraries that will present the fullest possible breadth of relevant items and content in users’ search results,” Green said. “And ultimately, we hope that this will lead to a platform that will enable librarians, information scientists, and researchers to launch in-depth studies of collection use statistics, cataloging schemas, and content access, and also share their methodologies and analytical tools inter-institutionally.”

Michael Simeone, Associate Director, Interdisciplinary Studies at I-CHASS and project co-PI, said, “This will be an important project to show how high-end computation can help us understand the individual decisions that contribute to making broader-scale knowledge.”


The University of Illinois at Urbana-Champaign is the proud home of one of the largest and richest public research library collections in North America with more than 12 million volumes. For more information on the Library at Illinois, visit http://www.library.illinois.edu.


Scientists, engineers, social scientists, and humanist around the world – many of them at colleges and universities – use advanced digital resources and services every day. Things like supercomputers, collections of data, and new tools are critical to the success of those researchers, who use them to make us all healthier, safer, and better informed. XSEDE integrates these resources and services, makes them easier to use, and helps more people use them. The five-year National Science Foundation-funded XSEDE project supports 16 supercomputers and high-end visualization and data analysis resources across the country through a collaborative partnership of 17 institutions. For more information on XSEDE, visit: https://xsede.org.

« Previous post:
» Next post: