SWIFT

Finding Aids:
HTML Markup Cookbook

Table of Contents

  1. Introduction
    1. HTML, SGML and Finding Aids at Columbia
    2. Pointers to HTML documentation
    3. Sample results (documents obtained by following these guidelines)
    4. Support
  2. Parts of online Finding Aid at Columbia
    1. Frameset
    2. Title page
    3. Main page(s)
    4. Navigator
  3. Scenarios (how to go about creating an online Finding Aid at Columbia)
    1. Complexity Classification
  4. Converting to Finding Aid HTML
    1. What is Finding Aid HTML?
    2. Converting word-processor files to Finding Aid HTML
    3. Converting spreadsheets to Finding Aid HTML
  5. Other Issues
    1. publishing graphics
    2. Writing directly in HTML
  6. Deployment of an online Finding Aid at Columbia


  1. Introduction

    1. HTML, SGML and Finding Aids at Columbia

      HTML (Hypertext Markup Language) is the current method of choice at Columbia for encoding archival finding aids to make them accessible on the World Wide Web. Although it is likely in the future that a more sophisticated and flexible encoding tool like SGML will eventually replace HTML at Columbia, the Libraries are at the moment supporting HTML encoding because it offers a relatively quick and simple method of making machine-readable files available for use without a major investment of time. (It is important to note that the instructions below will lead to the creation of a document which can in most cases be easily converted to a simple SGML document when the time comes. Thus effort put into the HTML product will not be wasted.)

      The subcommittee of the Libraries' Archives Committee charged with developing a uniform structure for Columbia finding aids on the Web began with the assumption that it was desirable to create finding aids that had a relatively consistent look and feel, despite the fact that they might emanate from different parts of the University and describe very different collections. However, because they must follow the structure of the archival files, finding aids differ considerably in complexity, length, and degree of detail. Because of the limited flexibility of HTML, it is not possible to produce a single template that is suitable for all finding aids. We have therefore devised a model which is designed to produce finding aids that may vary in the nature of their contents or descriptive lists but are uniform in their visual structure. People wishing to encode a finding aid in HTML should therefore examine the models carefully to see which is closest to their own in the structure of the contents list and then should adapt that model to their specific needs. (NOTE: It is recommended that anyone attempting to encode a finding aid have both a knowledge of the principles of archival arrangement and some familiarity with HTML).

      HTML encoded finding aids can be created from existing word-processing files. They can also be created directly as part of archival processing. Some people have found it satisfactory to scan typed documents into Microsoft Word files and then encode those documents using the appropriate tools. In all cases, some tweaking or editing will be necessary.

    2. Pointers to HTML documentation

      In order to successfully compose and deploy an online Finding Aid at Columbia, certain knowledge of HTML markup language is required. To start with, knowledge of basic principles of HTML, web publishing and the standart (basic) set of HTML tags is absolutely necessary, and could be obtained by studying the Beginner's Guide to HTML (http://archive.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html).

      In order to handle the tabular data, knowledge of HTML tables (http://home.netscape.com/assist/net_sites/tables.html) is desirable, but there are great conversion tools that would let you convert existing spreadsheet tables into HTML without it.

      Finally, to create the desirable online Finding Aid layout with Navigator, Title page, etc., knowledge of Frames (http://home.netscape.com/assist/net_sites/frames.html) is needed. Note that this part could be done for you as part of support from LSO.

    3. Sample results (documents obtained by following these guidelines)

      Check the Rare Books and Manuscripts Library's Guides page (http://www.columbia.edu/cu/libraries/indiv/rare/guides/) for examples of online Finding Aids created by following the guidelines in this document.

    4. Support

      Varying degree of support can be obtained by the online Finding Aids creators from LSO (contact Nick Uglov, uglov@columbia.edu). It could include:
      • setting up HTML training sessions in groups (recommended) or individually,
      • requesting software installation (conversion software),
      • troubleshooting newly created online Finding Aids,
      • helping with finishing and deploying new online Finding Aids (setting up the frameset, working with images, etc.,
      • other?

  2. Parts of online Finding Aid at Columbia

    All finding aids should contain the following elements: a title page; a navigational panel on the left, allowing movement between the parts of the document; front matter, including a transcription of the AMC record, if one exists, along with scope and content notes, a biographical summary, provenance and/or restrictions; and a contents list. A picture of the subject or other visual image (gif) may be added to the front matter, if desired. That is, each finding aid is a directory with its own name (e.g., Robert Wilson, Hart Crane, Stein and Day). Each directory contains at least four files: main.html, titlepage.html, navigator.html, and index.html. A gif (graphic image file) may be included but is optional.

    A single online Finding Aid at Columbia consists of at least 4 parts (files): frameset [F], title page [T], main page(s) [M], navigator [N], all appearing in the same web browser window at different positions and times. All the files related to a single online Finding Aid reside in the same directory exclusively devoted to that Finding Aid F
    N

    .

    T/M

    .

    1. Frameset

      In each online Finding Aid directory, this file is called index.html. This file brings the rest of the files comprising the online Finding Aid together in one browser window using <FRAMESET> and <FRAME> tags, and it repeats the contents of the Navigator so that the frames-incapable browsers (such as Lynx) can still be used to view the Finding Aid. It also can contain preliminary AMC information.
      Here is an example of a frameset file (do a "View Source" to see how it is composed)

    2. Title page

      In each online Finding Aid directory, this file is called title.html. This file is a document giving author & title information of the collection, the repository, and possibly the author's photo. This is the page users will see first in the main from a of the frameset when they get to the Finding Aid.
      Here is an example of a title file (do a "View Source" to see how it is composed)

    3. Main page(s)

      One or many pages containing the text of the Finding Aid itself. If only one page, call it main.html, if many - invent your own naming convention. When the user clicks on the links in Navigator, he/she is navigated to a particular place in the Main page(s).
      Here is an example of a main page (do a "View Source" to see how it is composed)

    4. Navigator

      The file with the outline of the Finding Aid, used as a Table of Contents. Hyperlinked to the appropriate places in the Main page(s). Appears in the narrow vertical frame on the left.
      Here is an example of a navigator file (do a "View Source" to see how it is composed)

  3. Scenarious (how to go about creating an online Finding Aid at Columbia)

    You could be arriving at creating an online Finding Aid in a variety of ways:
    You haveDo this
    1 Finding Aid in machine-readable form (word-processor, spreadsheet, etc.) convert to HTML: see the Complexity Classification for instructions
    2 Finding Aid in print only Use scanner when possible to OCR the material. Process the results either into HTML directly (observing guidelines for HTML coding below), or into a non-HTML machine-readable form (word-processor, spreadsheet, etc.), then convert: see the Complexity Classification for instructions
    3 Finding Aid does not exist, need to create online Finding Aid from scratch Two options:
    • create directly in HTML, observing guidelines for HTML coding below
    • create in non-HTML machine-readable form (word-processor, spreadsheet, etc.), then convert: see the Complexity Classification for instructions

    1. Complexity classification

      Short homogeneos Finding Aid -- more or less plain text, 1-10 printed pages, no tables, possibly a couple of images. Most likely resides in a word-processor (WordPerfect, MS Word, etc.)
      See
      • Converting word-processor files to Finding Aid HTML for word-processor files,
      • publishing graphics for graphics.

      Long homogeneos Finding Aid -- more or less plain text, more than 10 printed pages, no tables, possibly a couple of images. Most likely resides in a word-processor (WordPerfect, MS Word, etc.)
      Break up the file into "logical" chunks, preferably under 5 printed pages each (or by Box, or by Series, what have you), then (for converting each of the resulting files) see
      • Converting word-processor files to Finding Aid HTML for word-processor files,
      • publishing graphics for graphics.
      Note that in this case there will be several Main files [M] and the Navigator [N] and Frameset [F] will have to address all of them.

      Non-homogeneos Finding Aid -- Original consists of many files in different formats: some are word-processor documents, some spreadsheets, some plain texts, some tables within word-processor files, etc.
      It is clear that a universal recipe covering all the possible situations cannot be given. Inventiveness, good HTML knowledge, additional time and, possibly, consulation(s) with the LSO contact, will probably be necessary.

      In general, begin by separating files by their type. Convert each file separately: see

      • Converting word-processor files to Finding Aid HTML for word-processor files,
      • Converting spreadsheets to Finding Aid HTML,
      • for tables within word-processor documents, first cut'n'paste table contents into a spreadsheet program (MS Excel) and then follow instructions for Converting spreadsheets to Finding Aid HTML,
      • publishing graphics for graphics.
      Note that in this case there will be several Main files [M] and the Navigator [N] and Frameset [F] will have to address all of them.

  4. Converting to Finding Aid HTML

    1. Finding Aid HTML

      Encoding, the use of a hierarchical, nested, descriptive structure, is vital in any finding aid. Therefore, the adoption of a means to replicate such structure in an HTML environment is of paramount importance. In our system, indents - which can be thought of as akin to tab stops on a typewriter or word processor - are achieved by the use of two sets of lists: definition lists and lists. These lists can be used in any combination of ways to produce a meaningful format on the WWW. It is possible to have <DL>s within <DL>s; <UL>s within <DL>s; <DL>s within <UL>s, etc.

      For the purposes of word-processor - like documents, Finding Aid HTML encoding is based on a simple tag set which is associated with typical finding aid elements. Tags are used to identify the title of the collection, the series and sub-series entries, box or container numbers, notes, personal and corporate names, and contents description. The latter two lists - personal and corporate names, and contents -contain the substantive information in the finding aid, and are the most important part of it.

      For the purposes of a word-processor - like documnts, Finding Aid HTML encoding is based on a simple tag set which is associated with typical finding aid elements:

      • <H1></H1> -- This is used only once to indicate the TITLE of the collection.
      • <H2></H2> -- Used for all SERIES and SUB-SERIES entries
      • <H3></H3> -- Used for all BOX or CONTAINER numbers. Always flush left.
      • <BLOCKQUOTE> </BLOCKQUOTE> -- Use for all notes within the finding aid.
      • <UL><LI></UL> -- Used for lists of personal/corporate names.
      • <DL><DD></DL> -- Used for description on contents.

      The heart of the encoding system is a series of two lists--a definition list <DL><DD></DL> for all contents and an unordered list <UL><LI></UL> for all lists of personal/corporate names.

      After the container is described, such as:

      <H3>Box 1<H3>

      the contents of the container is described,

      <DL><DD>Put description here</DL>

      The important, and tricky, part of this system of tagging is multiple indents. Each indent is indicated by a completely new <DL><DD></DL> structure. Hence, it is entirely possible to have multiple, nested <DL><DD></DL> structures. Given this minor complication it is necessary to remember where you are in any particular structure and to CLOSE OFF each structure when it is finished.

      Here is an example:

      • This is how the text is formatted:
        
        Box 35
        
               Offprints by Merton: 
                      The Christmas Sermons of Bl. Guerric 
                      The Climate of Monastic Prayer 
                      Conversatio Morum 
                      Examination of Conscience 
                      For a Renewal of Eremitism in the Monastic State 
                      Liturgical Renewal 
                      The Pasternak Affair in Perspective 
                      La vida solitaria 
                      The Zen Koa 
               Offprints and articles relating to Merton 
                      Pamphlets: 
                            Miscellaneous 2 folders 
                            Monasteries 
                            The Monk in the Diaspora 
                            Thomas Merton Books, Fall 1988 (catalog). 
                            Two Articles by Thomas Merton 
                      Tearsheets from the Columbia Yearbook, 1937 
                      Tearsheets from "The Jester" 
                      Vespers Funeral Mass & Burial Mass for Thomas Merton
        

      • When tagging the above example the first thing you need to do is be aware of how many indents are needed and how they are nested. For the above, there are going to be three levels of indentation and three nested structures. This is how the structure looks in outline:
                1st indent (open)  <DL><DD>
                        2nd indent (will be closed)  <DL><DD></DL>
                1st indent (cont.)  <DD>
                        2nd indent (new, open)  <DL><DD>
                                3rd indent (will be closed)  <DL><DD></DL>
                        2nd indent (cont., closed) <DD></DL>
                1st indent (empty, closed)  </DL>
        

      • This is the tagging in order to achieve this formatting:
        <h3>Box 35</h3>
        	<dl>
        	<dd>Offprints by Merton:
        		<dl>
        		<dd>The Christmas Sermons of Bl. Guerric
        		<dd> The Climate of Monastic Prayer
        		<dd>Conversatio Morum
        		<dd>Examination of Conscience
        		<dd>For a Renewal of Eremitism in the 
        		   Monastic State
        		<dd>Liturgical Renewal
        		<dd>The Pasternak Affair in Perspective
        		<dd>La vida solitaria
        		<dd>The Zen Koa
        		</dl>
        	<dd>Offprints and articles relating to Merton
        		<dl>
        		<dd>Pamphlets:
        			<dl>
        			<dd>Miscellaneous 2 folders
        			<dd>Monasteries
        			<dd>The Monk in the Diaspora
        			<dd>Thomas Merton Books, Fall 1988 
        			   (catalog).
        			<dd>Two Articles by Thomas Merton
        			</dl>
        		<dd>Tearsheets from the Columbia Yearbook, 1937
        		<dd>Tearsheets from "The Jester"
        		<dd>Vespers Funeral Mass & Burial Mass for 
        		   Thomas Merton
        		</dl>
        	</dl>
        

      • In the above example the following is happening:
        1. The first indent is opened with <DL>
        2. A second indent is opened while the first indent REMAINS open resulting in a DOUBLE INDENT.
        3. After the second indent is finished, it is closed </DL> because we wish to return to the first indent.
        4. A third indent is opened which will align with the second indent above.
        5. A fourth indent is opened while the first and third indent remain open which results in a TRIPLE INDENT.
        6. The fourth indent structure is closed, returning us to the third indent alignment.
        7. After the finish of this box, the remaining indent structures are closed.
        It is EXTREMELY IMPORTANT to close off indent structures when they are finished and to be aware of how many indent structures are open at any given point in the hierarchy. FAILURE TO OBSERVE THIS WILL CAUSE YOU PAIN AND FRUSTRATION AS YOU HUNT FOR </DL>s throughout your document when it formats in odd ways.

      The tagging for lists of personal/corporate names is quite simple. Each list must start with a <UL> and each name a <LI> e.g.

      <H3>Box 1</H3>
      <UL>
      <LI>name
      <LI>name
      </UL>
      

      The overall structure of a finding aid is

      <H1>TITLE<H1>
      
      <H2>SERIES</H2>
      
      <H3>Box or container number</H3>
      
              <DL><DD>Contents</DL> 
              OR
              <UL><LI></UL>
      
      <H2>SUB-SERIES</H2>
      
      <H3>Box or container number</H3>
      
              <DL><DD>Contents<DL> 
              OR
              <UL><LI></UL>
      
      Tags such as <b>, <i>, <BLOCKQUOTE> can be used just about anywhere in the structure. However, series titles should not come after Box numbers.

    2. Converting word-processor files to Finding Aid HTML

      If the current format of your Finding Aid is some word-processing format (WordPerfect, MS Word, etc.), you need to first get it into some form of HTML, and then work on it to conform to Finding Aids HTML. Tools to do that are not perfect and will not give you a clean output (this is why we suggest that if you are composing a finding aid from scratch, you do it directly in HTML).

      Generally, there are 2 ways to do the conversion:

      • use Rich Text Format and an intermediary
        1. in your word processor, save file as RTF (Rich Text Format)
        2. FTP the file to CUNIX
        3. run the RTF-to-HTML convertor: at the UNIX ($) prompt, type
          rtftohtml filename
          this will create 2 files: a main file with the same name but extension html and a 'ToC' file (delete it - it's useless)
      • use a direct conversion tool
        • for MS Word files, you can use MS Word Internet Assistant. You would need to download and install it.
        • LSO has Core Web.Suite installed on the scaning station in 221M Butler. You would need to make arrangements to use the machine.
        • HotMetalPro has an OK converter. HotMetalPro goes for about $200, inquire at LSO about installing and using it.

      With regards to results, all of the above recipes are equally mediocre: they provide output littered with unneeded font modification tags while ignoring many essential formatting features (a reminder: write straight in HTML if you are writing from scratch). You should pick the one that is logistically more suitable.

      NOTE: the resulting HTML document should not exceed 50K, otherwise it will take conspiciously long time to open in a web browser. Split the file in several if needed.

    3. Converting spreadsheets to Finding Aid HTML

      Good news: converting spreadsheets to Finding Aid HTML is a snap, compared to converting word-processor texts. You would need to download and install MS Excel Internet Assistant - contact LSO if you can't do it on your own.

      Once you've got that:

      1. Open your spreadsheet in MS Excel. If your spreadsheet is not natively Excel, you can still do it, but you would need to specify the file extension and answer a few questions Excel will ask.
      2. Sort and rearrange sells if needed.
      3. Higlight the whole spreadsheet or a range of sells to export.
      4. Pull down the 'Tools' menu, select "internet Assistant Wizzard', follow the instructions on subsequent dialog boxes.
        NOTE: If you choose to export the table not as a standalone file, but into a specific place in an existing file, follow the Wizzard's instructions on setting up such file and repeat the exporting procedure.
        SUGGESTION: When presented with an option to "Convert as much of the formatting as possible" or "Convert only data", select the latter to avoid unnecessary tagging in the output.
      5. FTP the resulting file to CUNIX
      NOTE: keep the resulting file size below 25 kilobytes, otherwise browsers will have problems manipulating them

  5. Other issues

    1. publishing graphics

      All the pictures and graphs should be saved as GIF89 (.gif) images, and referenced in appropriate places of your HTML-ised Finding Aid. If you have your imges in some other format, you can use LViewPro (available on any Mosis machine) or PaintShop Pro (available on the scanning station in Butler 221M). If your images are embedded into a word-processor file or a spreadsheet, you can copy them out into LViewPro or PaintShop Pro and save them as GIF89 from there.

    2. Writing Directly in HTML

      If you are starting out fresh creating a new finding aid, there are several advantages to writing it directly in HTML:
      • No conversion to go through later on
      • No formatting loss during conversion
      • No multiple versions to support - editing one document takes care of the changes
      • It's cool
      You can edit directly in HTML:
      • with manual HTML markup
        • on the Unix side, using your favorite Unix editor
        • on the PC / Mac side, using a plain text editor such as Notepad or Wordpad (or, in fact, MS Word, saving files as plain text)
      • with automated HTML markup, using a dedicated HTML editor such as HotMetal Pro, Netscape Gold (available for free download), etc.

  6. Deployment of an online Finding Aid at Columbia

    As some of the suggestions above already mentioned, all the files need to be on the CUNIX machine, FTP-ed there as needed. All the files for a given Finding Aid must reside in a dedicated subdirectory, bound together by the structure described in "Parts of online Finding Aid at Columbia". LSO help could be used to combine and deploy the files for a Finding Aid. Keep in mind that you would need not only the text of the Finding Aid, but also the "outline" of it, which will be used as a Navigator.


Last revision: 3/27/97
© Columbia University Libraries