Library Initiatives, Fall 2000
- I. Collections, Services, and Systems
- II. Projects and Programs
- III. Publications
- IV. Specific Digital Library Challenges
1. Primarily Text
Columbia International Affairs Online (CIAO)
Launched in Fall 1997 with seed funding from the Andrew W. Mellon Foundation,
Columbia International Affairs Online (CIAO) includes a mix of
scholarly materials in the field of international affairs, including
working papers, conference proceedings, journal abstracts, books,
maps, links to a wide variety of related sites, and a sophisticated
indexing and search system that allows scholars and students to use
the publication in a number of different ways.
The project is now self-sustaining through subscription sales, after
three years of development. Our experience with CIAO suggests that
scholars are interested in exploring the potential of online
publication and research, and that enough librarians and end-users
find this material sufficiently valuable to purchase it at a price
that permits cost-recovery over the long term.
CIAO is part of the Electronic Publishing Initiative at Columbia
(EPIC), described below.
- The American Historical Association's Gutenberg-e Dissertation Prize
This innovative new project explores the impact of awarding a
prize backed by the prestige of the American Historical Association
for electronic publication of monographs in several areas of history
that are considered to be "endangered" in the publishing arena. The
project will examine the reaction of the scholarly community (authors,
academic review committees, administrators, and end users) to the
electronic publication of history dissertations, as well as the
cost-recovery model that would make such an enterprise feasible over
the long term.
Columbia, through its EPIC center, has been selected as the
publication site for this project and has received funding from the
Andrew W. Mellon Foundation for three years to cover the costs of
publishing these materials online. EPIC is holding a series of
seminars for the winners of the prize designed to guide the authors in
how to create online publications from their print dissertations. The
first set of online versions will become available in early 2001.
- Online Books Evaluation Project  (1995-99)
With support from the Andrew W. Mellon Foundation, the Libraries and
Academic Information Systems (AcIS) collaborated with publishers to
assess costs, user preferences, and potential delivery
packaging for monographs in online electronic
form. Participating publishers include Columbia University Press,
Oxford University Press, Garland, Simon and Schuster Higher
- Humanities Texts
The Libraries and AcIS provide classic literary and philosophical
texts in ASCII, HTML, and SGML formats for study, searching and analysis.
- Virtual Reading Room
A joint project of Columbia College, AcIS, the Libraries and the
Columbia Center for New Media Teaching and Learning (CCNMTL),
described below, to create annotated versions of works studied in Core
Curriculum courses. These versions will be integrated with
instructional support software facilitating student annotation and
- Commercial Collections
Columbia is also actively building a collection from commerical
sources of information. In 2000-01 it expects to spend more than 1.3
million dollars on networked reference tools. CD-ROM and e-journals
are largely paid for with other funds. Commercial digital
collections include suites of databases from ABC Clio, Bowker,
Cambridge Scientific, Chadwick-Healy, Dialog, Gale, OCLC First Search,
Ovid, RLG, Silver Platter and others. It also has comprehensive title
access agreements with most major e-journal publishers, e.g.,
Blackwell Science, Elsevier, Springer, etc. Columbia's users also
have access to a variety of integrated full-text services like
Academic Universe, Ideal, J-Store, Project Muse, ProQuest, Science
Direct, etc. Columbia, with Cornell, Dartmouth, and Middlebury
College provide access to nearly 20,000 NetLibrary e-books. They are
involved in a fairly unique system where users actually select what is
to be purchased. Columbia's Business, Law, and Health Sciences
libraries also provides access to sizable specialized reference, data,
and e-journal collections.
2. Primarily Images
- Ling Lung Women's Magazine
Ling Lung Womens
Magazine was published in the 1930s in Shanghai, China at
a time when womens role in society, at least in that
sophisticated and foreign-influenced metropolis, was in rapid
transition. This pocket-sized, slender, and inexpensive weekly,
boldly ventured to meet these new needs by encouraging
women to advance toward the good life through socially high-minded
entertainment. It was filled with articles on fashion, interior
decoration, pop psychology,
and new careers; and also advice columns on love, sex, and marriage,
as well as lavish illustrations of local and Hollywood celebrities.
The wide array of advertisements for womens products are often
just as revealing of life and aspirations as the words of the text.
The first issue came out on March 18, 1931 and the magazine ceased
publication in 1937. As far as we know, Columbia University's Starr
East Asian Library is the only library outside China to hold a nearly
- Art Humanities Reserve Collection (1995)
The Libraries and AcIS have created an image library of several
thousand images. The fully-cataloged collection focuses on material
from a core course, required of all undergraduates, and is used both
in electronic classrooms and for study.
- Advanced Papyrological Information System (APIS)
With support from the National Endowment for the Humanities, APIS is a
multi-institutional project to create a digital library of papyri,
transcriptions and related bibliographical information. The Libraries
and AcIS are collaborating with faculty both in implementing the
project at Columbia and in coordinating the overall effort.
- Digital Scriptorium
With funding from the Andrew W. Mellon Foundation,
the Libraries and AcIS are collaborating with
University of California, Berkeley Libraries to create a digital
library of dated and datable medieval manuscripts.
- Judging a Book by Its Cover: Gold-Stamped Publishers' Bindings of the 19th Century
The advent of gold-stamped decoration, circa 1832, was the most
important factor in the acceptance of publishers' bindings. Gold
stamping brought to the mass-produced book some of the prestige
associated with gold-tooled leather bindings of the pre-industrial
era. In fact, stamping often imitated the decorative styles and
motifs of the hand-finished book. However, gold stamping also
developed its own styles and imagery that reflected the period' s
taste and culture.
- Museum Educational Site Licensing Project (199497)
Columbia participated in this J. Paul Getty Trust-coordinated project
to explore the use and economics of digital images in university
research and teaching. The project developed mechanisms, both
technical and procedural, to deliver 10,000 images from eight
U.S. museums to seven research universities. A project to evaluate
the process and economics of this distribution was supported by the
Andrew W. Mellon Foundation.
- Oversized Color Images (199496)
This joint project, from the Libraries and AcIS, has evaluated options
and recommended best practices for preserving and accessing brittle
textual materials containing large sized color images. (Funded by
3. Mixed Media
- Columbia Earthscape: An Online Resource on the Global Environment
This publication, launched in December 1999, has been funded by the
Scholarly Publishing and Academic Resource Coalition (SPARC) and the
National Science Foundation for three years for development and
launch. Columbia Earthscape seeks to explore how the collaborative
model of content development in the digital environment can transform
both scholarship and education in the rapidly-developing field of
earth systems science.
As part of the Electronic Publishing Initiative at Columbia (EPIC),
and based on the design and development experience of our
other EPIC projects, we have created a fully-integrated,
interdisciplinary, interactive online publication for both research
and teaching resources in this field and will evaluate the ongoing
educational value and economic viability of providing these services
on a cost-recovery model.
- Online Music Collections
Each semester, instructors of the Music Humantities course, required
of all undergraduates, make audio selections of reserve material
available to their students over the network. A "Sonic Glossary," an
online audio dictionary of musical terms is also available. Students
may access this material from residence hall rooms or from
several campus laboratory facilities.
- Asian Topics
Asian Topics is a joint project of the East Asian Curriculum Project
(EACP), for teachers and students at the pre-collegiate level, and the
Project on Asia in the Core Curriculum, for teachers and students at
the undergraduate level. It is designed to be a digital library of
multi-media presentations that bring leading scholar-teachers in Asian
studies, from Columbia and other institutions, into classrooms,
libraries, and homes, speaking on topics in Asian literature, history,
religion, and contemporary society. The Topics are interdisciplinary
and draw on Columbia's long experience in the intellectual design and
production of content materials on Asia for teachers.
Each Topic features a cameo, audio-visual presentation by a leading
scholar-teacher designed to engage the teacher or student new to the
study of Asia with the Topic he has chosen to explore. Each entry also
provides bibliography, background essays, other web links, and
curricular links for the viewer to pursue the topic in more depth. The
site is designed to provide a "first look" on the Topic for teachers
who come to the web to gain background on a new subject they plan to
introduce in class and for students seeking initial direction for a
- South Asia Resources
With support from the Dharam Hinduja Indic Research Center and the
Department of Education, Columbia is collaborting with the University
of Chicago and the North Carolina Triangle South Asia Consortium to
create networked versions of the twenty-six modern literary languages
of South Asia.
The Hinduja Center is supporting preservation and networked access to
Indic manuscripts: 325 important manuscripts from Columbia's Rare
Books and Manuscripts Library, in various languages of India. The
project includes detailed metadata creation, online publication of an
in-depth study of these particular manuscripts (by the world's
greatest Indic codicologist, Prof. David Pingree, Brown Univ),
microfilming and scanning from microfilm to create full page images of
the entirety of this collection, hyperlinked to the metadata.
Columbia's South Asia Resource Access on the Internet (SARAI) is is
designated by the WWW-Virtual Library as the official World Wide Web
Virtual Library for South Asia.
- Electronic Data Service
The Center, jointly operated by the Libraries and AcIS, the EDS is the
Universitys numerical data archive. The index for much of this
collection is being provided over the network through the
EDS Datagate, and has become a recognized national resource.
Gigabytes of raw data are also available to scholars locally.
- Direct Borrowing
Borrow Direct, developed by RLG, Columbia, Penn and Yale, went live in
November 1999. This direct consortial borrowing pilot project offers
faculty, staff and students at the three institutions the ability to
request and borrow circulating items from the combined stack
Borrow Direct provides a combined Z39.50 search of Penn's Endeavor
catalog and Columbia and Yale's NOTIS catalogs, then determines
availability, handles request management and connects to the local
circulation systems. Requesters receive e-mail notification at each
stage of the request process. A commitment to priority handling with a
maximum of four days from request to receipt is central to the design of
"CLIONotify" is a user-profile-driven new book notification service
currently in pre-release testing for a likely fall 2000
implementation. This service provides weekly email notification to users
about newly-cataloged books, electronic resources and other media that
match patrons' study and research interests. User profile editing and
viewing is secured via the campus Kerberos-based authentication
system. User interests are specified using CU's prototype "Hierarchical
Interface to LC Classification" (HILCC), enabling MARC catalog records to
be matched on LC Classification numbers. Keywords may also be used to
qualify the broader HILCC categories.
- Electronic Text Service (ETS)
ETS is a research and instructional
facility of the Columbia University Libraries designed to
help Columbia faculty and students incorporate computer-based textual
and bibliographic information into their research,
study, and teaching. ETS has machine-readable primary source texts,
software programs for textual analysis and critical
editing, hypermedia and database research tools in the humanities,
bibliographic database management programs, IBM and
Macintosh microcomputers, X terminals, and optical scanning equipment
for the creation of machine-readable text. The ETS
staff will provide demonstrations, workshops, and classes for students
and faculty, as well as individual consultations.
- Access Management
Several systems manage access to documents and services, whether
locally or remotely provided. AcIS has created systems to integrate
the campus identity infrastructure, based on the Kerberos technology,
and the campus authorization service, built from distributed databases
and delivered by an LDAP directory. These systems control access to
restricted resources locally, on secure web servers, and remotely,
routed through secure proxy servers.
Over the last two years, we have been active in the DLF pilot project
to use X.509 digital certificates and LDAP to more effectively provide
access to remote services without using proxies (see also under
Publications, below). Beginning in Fall 2000, we have begun using
certificate technology to create a secure "identity hand-off"
mechanism to third-party vendors in the context of individualized web
services, often known as "portals." Related to this work, we are
developing prototypes for identity hand-off among applications
("channels") within individualized services.
- Master Metadata File
This facility, developed by Library Systems and AcIS, is a
relational database application holding bibliographic and structural
information for collections held locally or remotely.
It is able to represent multiple versions, collections,
aggregations such as pages in a book, and hierarchies of digital
objects. While still under development, it is currently being used in
several projects. Information may be imported and exported in several
formats. The database may be used as an intermediate architectural
component and may be queried interactively.
- Hierarchical Interface to LCC
Columbia's "Hierarchical Interface to LC Classification" (HILCC) is a key
component of CU digital library metadata planning. It is based on a
detailed mapping of the LC Classification schedules to language-based
subject categories ("entry vocabulary") organized hierarchically, e.g.,
QD415.000 - QD436.999 = Sciences -- Chemistry -- Biochemistry
Preliminary versions of HILCC are already used in production to create
browsable Web subject listings for the "reference tools and indexes"
portion of CU digital library collections, and also for user profile
creation/MARC record matching in the CLIONotify application (see
above). Columbia plans to continue to develop and refine this tool for
campus digital library use and also circulate it to other research
libraries for discussion and possible development as a prototype standard.
- LOCKSS Alpha Test Site
Columbia is an alpha test site for the LOCKSS system prototype.
LOCKSS preserves access to scientific journals published on the web.
The system ensures that hyperlinks continue to resolve and appropriate
content is delivered, even when the content is no longer available
from the original source. Libraries running LOCKSS cooperate to detect
and repair preservation failures.
- Significant Topics
This research investigates the relationship between the occurrence of
significant topics in a document and the structure of the
document. The unique contribution of this research lies in the
innovative combination of statistical and rule-based methods to
identify a list of significant topics as a function of the
distribution of terms in documents. To the extent that our techniques
are based on linguistically-motivated patterns and not on
domain-dependent vocabularies, our patterns apply to general text.
- University-wide Bulletin
A comprehensive University-wide schedule of classes, updated daily,
has been available for a number of years. The current schedule
reflects data from the Registrar, by extracting from that
administrative database, as well as from individual departments and
instructors, by providing a mechanism for them to supplement the
central data with links to, for example, instructional material
("course home pages") and information on instructor's research.
Beginning in Fall 2000, links to instructional material will be
enhanced and links to library reserve materials will be added.
A multi-document summarization system, MultiGen, automatically
generates a concise summary by identifying similarities and
differences across a set of related documents. Input to the system is
a set of related documents, such as those retrieved by a search engine
in response to a particular query. Our work to date has focused on
generating a summary including similarities across documents. Our
approach uses machine learning over linguistic features extracted from
the input documents to identify several groups of paragraph-sized text
units so that all units in each group convey approximately the same
information. Shallow linguistic analysis and comparison between
phrases of these units is used to select the phrases that can
adequately convey the similar information. This task is performed by
the content planner of the language generation component and results
in determination of summary content. Sentence planning and generation
are then used to combine the phrases together to form a coherent
- PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video And Language resources)
This project will provide personalized access to a
distributed patient care digital library through the development of a
system, PERSIVAL (PErsonalized Retrieval and Summarization of Image,
Video And Language resources). PERSIVAL will tailor search,
presentation, and summarization of online medical literature and
consumer health information to the end user, whether patient or
healthcare provider. PERSIVAL will utilize the online patient records
available at Columbia Presbyterian Medical Center (CPMC) as a
sophisticated, pre-existing user model that can aid in predicting
user's information needs and interests. For those patients with no
CPMC patient record, PERSIVAL will ask specific questions, depending
upon the user query and clinical context, to build and maintain a
skeletal user health information model.
Key features of the proposed work include personalized access to
distributed, multimedia resources available both locally and over the
Internet, fusion of repetitive information and identification of
conflicting information from multiple relevant sources, and
presentation of information in concise multimedia summaries that
cross-link images, video, and text. Given the widely varying nature of
online resources, research in retrieval and search methodology will
focus on automatically identifying source type (e.g., journal articles
vs. self-help groups), quality, and level of intended audience. In
addition to fusing information from multiple sources, summaries must
also express facts in terms the user can understand, regardless of
background. Video sources range from diagnostic test results to
educational video, each of which requires search based on image
characteristics and identification of significant events to aid in
finding appropriate clips.
- Content-Based Multimedia Search and Retrieval
This research focuses on development of next-generation image
technologies for efficient multimedia content creation,
manipulation, searching, and distribution. Multimedia content,
especially images and videos, are becoming increasingly
important thanks to the rapid advancement in multimedia computing and
communication. In particular, the ubiquitous impact
of the Internet has made it possible for general users to disseminate
and access multimedia content on-line easily and quickly.
- CARDGIS: Center for Applied Research in Digital Government Information Systems
CARDGIS brings together a strong team of researchers and developers
with interests and experience in databases, human-computer
interaction, knowledge representation, data mining, and other areas of
computer science and information systems. The center's mission is
research in the design and development of advanced information systems
with capabilities for generating, sharing and interacting with
knowledge in a networked environment. Participants are drawn from
Columbia University's Department of Computer Science, from the
University of Southern California's Information Sciences
Institute. Technical assistance is provided by experts from several
Federal government agencies.
- Electronic Publishing Initiative at Columbia (EPIC)
The Electronic Publishing Initiative at Columbia (EPIC) is a
groundbreaking new initiative in digital publishing at Columbia
University that involves Columbia University Press, the Libraries, and
Academic Information Systems. Its mission is to create new kinds of
scholarly and educational publications through the use of new media
technologies in an integrated research and production environment.
Working with the producers of intellectual property at Columbia
University and other leading academic institutions, it aims to make
these digital publications self-sustaining through subscription sales
to institutions and individual users.
EPIC is committed to pursuing the highest standards in the
development of content, use of technology, handling of issues of
intellectual property and copyright, development of business plans,
and evaluation of use. Its publications are designed to be innovative,
efficient and cost-effective.
Current constituent projects within EPIC are described above:
Columbia International Affairs Online (CIAO), Columbia Earthscape and
- Columbia Center for New Media Teaching and Learning
In partnership with the faculty as content experts, the Center is
committed to advancing the purposeful use of new media and digital
technologies in the educational programs of Columbia University. It
is committed to ongoing evaluation of the efficacy of its work within
The Center is committed to extending the population of involved
faculty by providing them with a broad range of points of access:
workshops, forums, individual consultations, as well as ongoing and
sustaining support in the development of projects. The Center begins with
anyone who is willing to bring a syllabus. In developing more
advanced projects, it is committed to building visible
heuristics, that is, projects in collaboration with faculty that
act as demonstrations and explorations of pedagogical and curricular
The Center is committed to building what is called the Columbia Educational
Operating System, a suite of integrated applications that extends the
capacity of students and faculty to capture, analyze, and integrate
data in new ways. CEOS will also provide equally powerful
communications tools to facilitate dialogue and the exchange of ideas.
The Center is about building partnerships and providing the motivation
and venue for the integration of disparate efforts in digital
development. In this capacity, the Center is presently working with
not only individual faculty members but also other entities of the
University committed to similar goals, such as CIESIN, CERC, AcIS, CME
and others. The Center is also active in contributing to the
strategic planning on the school, college and university level.
- Center for Research on Information Access
This center was established in early 1995 to act as a vehicle for
linking different projects on the Columbia campus involved in
developing and using digital technology. Initial funding comes
from the Office of the Vice-Provost under the Virtual Information
Initiative. CRIA has received funding from national agencies,
foundations, and industry.
CRIA is committed to facilitating connections between projects for
exploring new technologies, developing new electronically
available resources, and improving instruction, which are all
essential facets of information access. CRIA is housed within the
Columbia University Libraries with links to the Computer Science
- Columbia University Digital Library: Architecture and Services.
This paper describes Columbia's technology framework for digital
collections generally, including strategies for distributed
portions of infrastructure, independent development tracks, and folowing
- Policy for Preservation of Digital Resources
Digital resources are part of the Columbia Libraries collections and
subject to the same criteria for selection and retention decisions as
other media. As such, they are included under the central preservation
policy: ensuring that the collections remain available over the long
term, through prevention of damage and deterioration; reversing damage
where possible; and, when necessary, changing the format of materials
to preserve their intellectual content.
- Cross-Organizational Access Management
This November, 1999 D-Lib Magazine article describes the DLF
authentication and authorization architecture pilot project. The
architecture uses X.509 digital certificates and LDAP directory
services to create a highly effective access management system.
How can we best balance the integration into decentralized campus
organizations, such as libraries, computing, schools, departments, and
new media learning, teaching and publishing centers with the need for
central coordination, planning, architecture and standards?
- What will prove the most effective methods to evaluate the use and
impact of our digital library efforts?
What practices will ensure long-term preservation of digital library
What is the most appropriate critical path to effect more
interoperable architectures? What is the best mix between
standardization and development among the many dimensions of