Columbia Technology for Electronic Publishing
The Need
- Tools to automate the creation of educational content for DKV and EPIC, and ultimately for the university as a whole. Many of our resources have been built manually and require extensive human intervention to build, update, and maintain.
- A system for discovery, repository, and rights management services. Many of our assets are being tracked by hand, with all the inconsistency and difficulty this implies.
- A method for the output of educational materials with metadata that complies with the various standards, for seamless organization and retrieval from within and without. Our current use of metadata is not consistent, so searches within CU resources are not as meaningful as they could be. In addition, there is currently no way to provide metadata for incorporation in other websites.
Technologies for digital asset and rights management, workflow management, institutional repositories, and electronic publishing are in active development nationally and internationally. By developing our systems in coordination with projects at institutions such as the Library of Congress, MIT, the London School of Economics, and Internet-2, Columbia University is furthering the field of digital library and e-publishing research, is insuring the longer-term usability and interoperability of our products, and is taking a strong leadership position in the field of digital scholarly communication technology development.
The Requirements
The innovations and efficiencies created by our system will liberate staff from repetitive production tasks, while improving internal communication and productivity. They will also enable more efficient creation of new types of educational and research resources, which will be increasingly more flexible, sophisticated, and user-centered. These products, developed by production, editorial, and technical teams, can be easily maintained through the use of our content management software.
While our standardization is initially driven by the process needs of DKV, it is also coordinated with broader standards in the Internet community, the education community, and the library community. This will improve access to DKV content and integration of the content with other systems within Columbia and externally.
While we will purchase tools when it is more efficient and economical for us to do so, we will also be developing many systems ourselves, using open-source methods. This orientation will allow us to participate in, and take advantage of community development projects. For example, we are engaged in funding proposals with the National Science Foundation (in collaboration with the London School of Economics and the UK Joint Information Systems Committee) and the Andrew W. Mellon Foundation (in collaboration with MIT) to develop and deploy the technology above in an open-source context.
The specifics of this project can be divided into four entities:
Repository: There must be a place to store and manage assets, whether they be images, video, flash, or any other file format. We are deploying and developing DSpace, institutional repository software, in collaboration with MIT and other institutions. We will use it as part of our system and provide it as a service to Columbia.
Metadata Catalog: We will develop a generalized metadata catalog and supporting applications as a central component of our system. It will manage our metadata vocabulary and allow for creation and maintenance of metadata for all published content and for the export of that metadata in any relevant standard schema (DC, METS, IMS, MARC, etc). In addition, we are building an application to generate a topic map, and this will be included in this module.
Workflow Module: We will develop applications to support specific workflows and data requirements as necessary. For example, we are building workflow systems for rights and research and production.
Publishing Module: We are developing XLST libraries to publish content from XML data. We will extend this work into XSLFO to allow us to generate content for print media.
The DKV technology for electronic publishing strikes a specific balance among the internal needs of DKV, our key University partners in the Libraries, EPIC, and Academic Information Systems, and the external customers, funding agencies, and peers who will continue to advise our strategy and contribute to its evolution.
Current Status
As our development effort continues, we are migrating entirely to XML-based communication tools. We have applied these technologies to Columbia American History Online (CAHO), which launched in November 2002, and to Columbia Educational Resources Online (CERO), which launched in January 2003. The technology will become generalized over the next several months. It may be applied to other DKV products, such as E-Seminars and clusters of related subject tools and Columbia Interactive, as well as to the existing EPIC knowledge centers, CIAO, Earthscape, and Gutenberg-e. Our asset management plan can also offer discovery, repository, and rights management services to the wider University community.
Staff
Jack Donovan
Brian Hoffman
Richard Seymour
