Plan for New Libraries Digital Program
This paper reviews the Columbia Libraries achievements over the past several years in digital library development, and outlines our plans to create a new Libraries Digital Program that will focus resources and attention on further development in this critical area. A new Digital Program Division, initially comprising eight programmers, librarians and specialists, will provide core staffing for this new initiative. Staff throughout the libraries are invited and encouraged to work with this group to guide their work and to ensure that we develop tools and resources that meet the research and teaching needs of Columbia’s faculty and students. 1) Early Columbia Digital Library Initiatives
Columbia University Libraries has an imposing record of important digital library research and development projects dating back to the early 1990s. These include:
These projects were carried out in collaboration with Academic Information Systems and other parts of the University. They were selected chiefly to explore the specific kinds of questions and problems that arose as electronic tools and resources first began to appear on the scene and as opportunities to collaborate with partner institutions and funding agencies presented themselves.2) Current State of the Libraries Digital Program
These early tests positioned the Libraries to take on more ambitious, content-oriented projects, again in partnership with AcIS and often in collaboration with colleague institutions outside Columbia. Currently these collection-oriented projects include:
The Advanced Papyrological Information System (APIS) is a searchable database of metadata records and images for papyri and ostraka from six major US universities. Columbia designed this system, developed the search interface, and maintains the database. APIS is recognized by scholars in the field as an important new tool for scholarship. The current phase of NEH funding will make it possible to continue to add records and images to the database. A follow-on grant proposal for 2003-2005 is currently in preparation. (1995-2002)
The Digital Scriptorium, funded by Mellon and NEH, is another collaborative effort to build a scholarly resource that brings together medieval manuscript pages from US institutions. Currently the collection includes manuscripts from twelve institutions, and Columbia is in a planning phase with Harvard, Yale and Berkeley to add more institutions and formalize the governance of the resource. A planning grant proposal for 2003-2004 is currently in preparation. (1996-2002)
The Greene & Greene Virtual Library is a joint project with the Gamble House/USC, Berkeley and Columbia, funded by the Getty Foundation to create a scholarly web site of the architectural drawings and photographs of Charles and Henry Greene, American Arts and Crafts architects. (2000-2002)
John Jay Papers. Funded primarily by NEH, this will provide an online index to all known documents written to or by John Jay, one of America’s founding fathers and a graduate of Columbia, then King’s College. Some 100,000 page images will be scanned and linked to metadata records. (2001-2002)
These projects have made solid and continuing contributions to the world of scholarship and research.
In addition, we have now underway an innovative research and development collaboration with CRIA:
Computational Linguistics for Metadata Building (CliMB) is a two-year Mellon-funded research project developed by the Libraries and CRIA to create innovative uses of computational linguistics techniques for the identification and extraction of descriptive metadata from scholarly monographs and articles, in order to improve access to image and other digital collections. (2001-2002)
A key strategy in designing and carrying out these projects has been to lay incrementally the foundation for a coherent and scalable system architecture and metadata repository that can be reused and built upon for future projects. Components of this architecture have now successfully been put in place to provide rudimentary data management and access capabilities both for historical, content-based digital library projects and for our burgeoning commercial electronic collections. This strategy has allowed us to maximize the time and efforts of our limited existing technology staff.Some of the key components of this infrastructure that have been implemented so far are:
The Master Metadata File (MMF), a robust, extensible SQL application and database schema optimized for the storage and management of information about digital library collections and resources.
Electronic Resource Name Resolver, a system for creating and managing “persistent URLs” that provide stable and ‘bookmarkable’ access to individual commercial resources, regardless of changes in physical location, publisher URL, etc.
Electronic Resource Proxy Server (EZ-Proxy), a commercially-provided software system for extending access to licensed electronic resources to members of the Columbia community connecting off-campus via commercial ISPs using DSL, cable modems, etc.Using these technology tools we have made significant progress in providing Columbia students, faculty and researchers with effective access to commercial electronic publications and resources – which now represent some 14% of the total library materials budget. The Library’s Web interface for electronic journals is now served dynamically from the MMF database, which itself is fed by CLIO and external vendor data sources. All licensed titles are now systematically proxied for remote access and linked to “permanent URLs” that can be reliably bookmarked and referred to over time. These initiatives have provided critical functionality not available in vendor-supplied automated library systems and given us an application environment where new content-based projects can be built in a more consistent and efficient manner. It is, however, only the beginning of what will need to be a larger effort to provide consistent and reliable access to the information sources that are already critical to Columbia’s research community and which will only continue to grow in number, complexity and importance.
Areas in which we have not yet been able to accomplish important and achievable digital library-related tasks include:
While the Libraries have been generally successful in these projects and initiatives, we have nonetheless found ourselves somewhat hampered by the lack of dedicated, digital library-oriented technology staff and an organizational structure that can make best use of existing staff working in this area.
We have come to recognize over time that national-level digital library initiatives require not only “a programmer” but also staff with specialized training and knowledge in areas such as information architecture; national and international metadata standards; information indexing, searching & retrieval; graphical interface design; SGML and XML markup and encoding standards; text, image and multimedia digitization; electronic archiving; outsourcing for technology support and text conversion; and project management. A successful University Libraries digital program also requires staff for outreach, consulting, assessment and local and national leadership.
To address these needs, the Libraries plans to create a new Libraries Digital Program Division. This team will initially comprise eight positions, drawn from existing staff, reallocation of positions and newly funded positions. This division will have its own identity but will continue to collaborate closely with the Library Systems Division, AcIS, CCNMTL and others. Our expectation is that this new division will provide the critical mass of staff necessary for effective digital library planning & implementation. It will also help ensure that we can sustain and scale up our infrastructure to provide reliable and effective access to the Library’s increasingly critical body of licensed electronic resources. Further, the new Libraries Digital Program will increase our capacity to create new digital content, will address the preservation of electronic collections and will advance the creation of a repository for the University’s unique digital assets.
The Libraries Digital Division will be complementary to AcIS and focused on priorities integral to the mission of the Libraries. The Libraries will necessarily and increasingly depend on AcIS for the vitally-important core services it offers, including web-hosting, application servers, authentication/authorization, expert technology assistance, etc. AcIS, CCNMTL and CRIA will remain key collaborators and consultants in library digital initiatives.
The Library Systems Office will also collaborate closely with both AcIS and the new Library Digital Division. LSO will continue to develop and support CLIO (the online catalog and integrated library system), staff and public workstations, and many applications and services running on the Libraries’ internal network and servers (CD-ROM networking, Reserves processing, ILL Manager, etc.) Initially, there will be considerable overlap in responsibilities with the Library Digital Division, with LSO retaining responsibility for components of the Digital Program such as the Name Resolver, the Proxy Server, and the Link Server (SFX).
As we move forward, the division of responsibilities among AcIS, LSO, and the new LDD adjusting to changes in technologies and programmatic needs, and making the best use of individual skill sets wherever they reside in the organization.
Positions to be included in the new division are listed below.
a) Existing digital library staff
The following staff positions have been working either part or full time on digital library projects; they will be moved into the new Library Digital Division:
b) New positions
Beyond staffing this core group, the Libraries Digital Program will include other components, including:
A substantive and effective Library Digital Program is critical for a major research library. Although Columbia Libraries has in the past been a key player nationally, it will be necessary to make a new investment in our staff and technology infrastructure in order to achieve our goals locally and at the same time participate persuasively as equal partners with colleague institutions such as Cornell, Chicago, Penn, Michigan, Berkeley and Harvard.
Expected outcomes from a fully staffed digital program division include:
a) Moving From Project to Program: An infrastructure designed to support a production environment, rather than occasional special projects, is now required. Only in this way can we develop a planning and implementation process that will allow us to develop staff expertise, gain efficiencies through systematization, prioritize effectively, meet deadlines and at the same time continue to have a current, robust and well-maintained digital library technology infrastructure and delivery and support environment for existing digital tools & content.
b) Improving Access to and Organization of Licensed Commercial and other Electronic Resources: As described more fully above, an excellent start as been made in this area, but more attention will be needed in order to ‘operationalize’ the gains we have made and ensure these systems can be scaled up and sustained.
c) Continuing and Expanding Existing Digital Content Initiatives: Renewed NEH funding is currently being sought for both the APIS and Digital Scriptorium projects. A new grant proposal is being prepared for Web access to Avery Architectural Drawings. The Joseph Urban project as funded includes a digital library component. A viable Libraries Digital Division will be needed to fulfill current project requirements and respond to new proposals.
d) Creating a Columbia Digital Repository: The Library’s Digital Program needs to implement a flexible repository for the storage, description and retrieval of electronic text, images and other media created as part of individual special projects. This service will be important to help manage the unique digital content that the University has already created will continue to create in the future. A repository will allow digital content to be repurposed and reused for different types of needs, while at the same time providing a necessary level of administrative control and rights management.
e) Developing Digital Preservation & Archiving Capabilities: Columbia also needs to have a complementary strategy and infrastructure in place for preserving, indexing and archiving over time key digital assets of the Library, EPIC, the Center for New Media, and other parts of the University. We and our peer institutions face the enormous and historically critical task of developing systems to determine what digital content will survive into the future. Initial studies are being conducted by our colleague institutions that will have important ramifications for us in this area and we will need to be able to respond effectively.
f) Participating in New Types of Columbia Partnerships: As faculty and departments in the University begin to develop digital collections of research findings, teaching resources, and other types of electronic collections and archives, there is and will be a large unmet need for expert consultation and advice – the Libraries Digital Program can play a key role in this area. Without this type of communication and collaboration, departmental projects and knowledge bases will be more expensive to create, often impossible to sustain over the long term and may lack interoperability with knowledge systems and resources elsewhere.
The Libraries also need the ability to respond effectively to faculty requests for specialized tools and services, e.g., in the area of citation management, database creation and hosting, metadata, imaging standards, etc. This type of consulting will be an important part of the Digital Program Division.
g) Responding to New Collections & Campus Initiatives: As an example, a recent gift to the University of a notable archival collection (e.g., of a former NYC mayor) came with a significant digital component, e.g., for the creation of a select electronic archive and web presence. Without experienced staff available to work with curators and external service organizations and to implement new and evolving standards and technologies for electronic archives, the Libraries will not be able to respond effectively to these opportunities.
h) Improving Support for Grant Writing: Senior digital library staff will provide badly needed assistance to curators and other non-technical staff in defining and describing the necessary technology components for grant proposals in the area of digital libraries.
i) Participating in National and International Planning & Partnerships: Columbia must collaborate substantively with other University and Research Libraries nationally and abroad to develop and promote digital library standards, to ensure future availability and interoperability of electronic publications and other types of digital resources. We must also make sure that we are in a position learn about, contribute to and take advantage of new technologies and best practices developed by other institutions.