Finding Aids -- Issues and Considerations

Columbia University Libraries Digital Program

Archival Finding Aids:
Technical Considerations

Path: Digital Library Projects : Archival Finding Aids : Technical considerations

Current Columbia Approach to Finding Aid Creation and Publication

At present, Columbia has no consistent approach to creating and publishing finding aids across the units of the Libraries or university-wide. The Rare Books and Manuscripts Division has developed style guidelines for presentation of their finding aids, although this has not yet been extended to other divisional units creating finding aids.

Individual archivists and curators within different organizational units have each developed their own personal approaches to creating electronic finding aids using chiefly ProCite and FilemakerPro, which are also used to generate HTML output for the web. We are experimenting with the use of XMetal & XSLT in one division (Avery). Columbia has no single master archive of finding aids; we provide no indexing across finding aids; finding aids are not integrated into other non-archival web resources.

Our goal over the next year to provide a more coherent technology approach to finding aid creation and management.
Finding Aids In Context

From Columbia's vantage point, the task of creating and providing access to archival finding aids presents a number of issues and challenges, not the least of them conceptual. In developing a coherent strategy in this area we feel it important to recognize that finding aids are not ends in themselves. They are, in the broader context of the digital library, a way of providing a certain, limited type of intellectual access to a single archival collection. The finding aid model was developed in the paper and print world and remains, even in its EAD / XML formulation, solidly based there. As a metadata structure, it diverges significantly from the data element / database model developed for other types of resources; as an end-user presentation model, it resembles nothing so much as the notebook with index tabs that it has replaced.

More significantly, the tendency to focus on the finding aid alone ignores equally important challenges facing archivists as well as those attempting to develop workable digital library strategies, namely:
1. Archival collection processing & management
2. Flexible & effective end-user presentation
3. Intellectual integration with other non-archival digital library resources
4. Functional integration with key digital library infrastructure components
The lack of available tools and solutions in these areas makes the already enormous workload of processing and providing access to new collections even more overwhelming. In that context it seems to some archivists that the task of just creating and publishing electronic finding aids for all of their collections is already more than they can aspire to.

The specific task areas listed above are described more fully in the following sections. It is clear, however, that these areas will not all be addressed collaboratively or probably even within a single institution in the near future. Still it is important to begin developing requirements and creating persuasive data models for these areas so that we can plan effectively at the institutional level and, interinstitutionally, develop consistent and compatible solutions. Some types of solutions (e.g., integration with local infrastructures) will always need to be locally developed and managed; with others we may be able to work collaboratively and perhaps even interest automation vendors in providing the toolkits and software support that could substantially reduce archival control costs for institutions.
Archival Collection Processing & Management

Given the enormous size and number of new and unprocessed collections that libraries and archives have acquired and will continue acquire, tools are urgently needed to support core tasks such as:
1. Intake & processing new collections
2. Adding material to existing collections
3. Conserving materials
4. Digitizing materials
5. Managing permissions & intellectual property rights -- including those of digitized materials -- often at the item level
6. Tracking various kinds of collection use, including: on-site consultation, publications based on the collection, loan of collections for exhibitions
The work carried out in development of the EAD standard has already contributed a large chunk of the analysis that would be needed to create a collections management data model; non-archival digital library efforts have supplied some of the other important components. What is needed is for the whole model to be fleshed out so that the individual components, including finding aids, can be shaped to interoperate with other functional pieces of the electronic environment.
Flexible and Effective End-User Access

The traditional finding aid presentation is that of a continuous text that provides information at the collection, series, subseries, container, and sometimes item level, usually presented in an order dictated by the physical organization of the collection. In an electronic environment this model may not always be the best way to present such information to users. In some online archival projects the "continuous text" approach to finding aid presentation has in fact been replaced by a severely menu-driven, hierarchical display; in others, presentation is chiefly in the form of database-style query responses. Neither of these alternatives -- on their own -- is necessarily an improvement over the traditional 'tabbed notebook' display.

What is needed is the ability to easily produce different kinds of presentations for different types of collections, and sometimes for the same collection. Checklists, browsable lists of key facets of the collection (e.g., projects, publications, names, chronologies, geographic content) or special online displays of significant collection subsets (e.g., architectural drawings displayed in conjunction with photographs of the completed buildings), will often be useful as supplements to or even replacements for the "notebook" format. This is particularly true of archival projects having a digitization component, where locating digital content should not necessarily require users to 'drill down' into the finding aid, and where some of the archival content will need to be repurposed for other presentations, e.g., an online exhibition based on the same collection.
Intellectual Integration with Other Digital Library Resources

The EAD finding aid format as usually implemented does not lend itself to functional integration with information about other archival collections or other types of library and museum materials. As an obvious example, a collection of architectural drawings may have excellent scholarly studies available in the libraries' online catalog, or closely related materials elsewhere in the local or national digital library -- but these facts may never be known to someone viewing the finding aid.

For these reasons, the metadata aspects of finding aids need to be able to be easily integrated, merged, indexed and presented along with other types of metadata. Moreover, metadata associated with the finding aid needs to be able to be managed in the same way and with the same tools as other non-archival metadata.
Functional Integration Digital Library Infrastructure Components

Since many archival collections are candidates for digitization, at least selectively, the display and access management of digital content must also be accommodated in one way or another. The finding aid format (and the EAD / XML standard) does not lend itself to these functions.

Structural metadata, relationships to digital object repositories, relationships to access and intellectual property controls, association with administrative metadata, cross-searching of normalized headings, etc., must all be available to and interoperable with finding-aid type information in the digital library environment.

The fact that most archival collections may never have digital content or require the type of detailed (and expensive) integration, management and end-user access functionality described above doesn't change the fact that some will and already do. And collection-based digital library projects in the future will almost certainly need the infrastructure to support this greater degree of functionality. The expanded archival data model thus needs to flexibly support both simpler approaches as well more complex & fully functional implementations.
Conclusions & Recommendations
1. The EAD movement has proven itself as a powerful force for improving the documentation of archival collections. However, the early decision to take an SGML "text-encoding" approach -- as opposed to a metadata / data element / database approach -- has shown itself to be limiting, particularly in the toolsets available for data collection & management and web publishing. The SGML choice has in some respects marginalized archival control, separating it from the mainstream of digital library development and support. It may be that XML will ultimately provide the tools needed to create and publish finding aids in the way the EAD developers originally envisioned; but the technologies for it have not yet matured, and it is still unclear how and whether other digital library infrastructures will migrate to XML. The most nearly suitable tools for finding aid creation, publishing, indexing, etc. in an EAD context (DTD-aware text editors, XMetal, XSLT, DLXS/XPAT, Tamino) are not part of most institutions' existing digital library infrastructure and are, variously, not scalable, not affordable or not yet mature enough technologically for most institutions to justify new investments in. For the foreseeable future it is urgent that we reconnect and integrate finding aid creation & management with the robust, existing tools that have proven themselves over time -- most specifically with database technologies.
  
  The largest finding aid publishers and aggregators (e.g.., California Digital Library, RLG) have already made heavy use of databases to support both data collection as well as web publishing of information derived from finding aids. An obvious and useful direction to pursue would be for the archival / digital library community to develop a standard SQL schema that parallels the EAD DTD. This would allow data to be moved from EAD / XML finding aids into databases and back, while providing the greater stability & functionality of databases for managing and publishing digital library information. If this archival SQL schema were developed & documented via a case tool such as Popkins System Architect or Rational Rose, it could be exported as needed to different SQL database platforms.
  
  This single step would immediately allow more effective integration and use of finding aid information in the short term, and would provide a more flexible and fully-supported testbed for the development of new archival tools and strategies in the future.
2. The RLG archives database could play a much greater role in integrating archival resources nationally if it were more proactively managed, developed and publicized. Cross-collection finding aid searching is difficult at the institutional level and virtually impossible at the national level without a database such as as this. We should ask RLG what its level of commitment and support for this system, and work with them on improvements and expansion. A fully-supported national archival database might make it unnecessary for smaller and mid-size institutions to make heavy local investments cross-collection search and retrieval.

Columbia Libraries

Digital Program