|
Table of Contents
- Introduction
- Quick Guide
- Explanations
and Definitions
- References
- Sample Sites
- Campus Display
Resources
revision; april 2, 1997
|
|
Introduction
This document provides recommendations for
image quality, file formats, and other capture and storage
issues when converting paper, photographic and other
physical materials into digital form. Additional documents
on the selection of materials for digitization, on how to
describe and index the materials being digitized, and on
digital library access mechanisms will be added in the
future.
These documents are intended for use by faculty, library
and computing staff as a guideline for image presentation
using the Digital Library. Recommendations have been made
based on lessons learned here at Columbia University as well
as elsewhere. We recommend you speak to a Library or AcIS
staff member (AcIS Help Desk, 854-1919, or e-mail to
consultant@columbia.edu) before beginning an imaging
project.
The Quick Guide provides a brief overview of file
formats, resolutions, pixel depth, etc., with specific
recommendations for conversion based on the type of the
original documents.
The Technology Summary provides additional definitions,
technical details, and capabilities available today to
display images using the current Columbia Digital Library
resources.
References included at the end of this section provide
much greater detail on digital image conversion, file
formats, and pointers to sample projects.
Sample Sites include pointers to on-campus projects that
provide file format and presentation examples for on-screen
and printing purposes.
Advantages of Digital Images
Storing two-dimensional materials in digital formats
offers a number of advantages. (Ester, 1996, pp. 2-4)
- Originals may be deteriorating, and digital images
will not deteriorate physically or chemically over time.
- Digital images permit identical reproduction quality
from copy to copy and from generation to generation.
- Digital images may be manipulated easily, far more
easily than by photographic means.
- Digital images are easily linked to textual
descriptions and catalog records.
- Access is greatly improved, using standard Internet
technologies and existing campus infrastructure.
Quick Guide
Different original media types will require
different digital conversion techniques as well as different
file storage formats. This is an area that is evolving, as
both conversion techniques improve (better scanners and
digital cameras) and as new file formats develop. The
following chart represents a set of recommendations derived
from national digital library recommendations (Reilly, 1996)
and the Columbia large maps project (Gertz, 1994-1995, and
Gertz 1996).
|
Media Type
|
Conversion Method
|
Resolution
|
Archive File Format
|
Screen Presentation Format
|
Print Presentation Format
|
|
Black & White Text Document
|
Flatbed Scanner or Digital Camera
|
1-bit,
600 dpi
|
TIFF w/CCITT Fax 4 Compression
|
GIF, 4-bit, 120 to 200 dpi
|
Acrobat (PDF), 1-bit, 300 or 600 dpi
|
|
Illustrations, Maps, Manuscripts, etc.
|
Flatbed Scanner or Digital Camera
|
8-bit grayscale or 24-bit color, 200 to 300 dpi
|
TIFF
|
Multiple JPEG, 24-bit, 512x768, 1024x1536,
2048x3072, Quality Level 50
|
JPEG, 24-bit, 2048x3072, Quality Level 50-100
|
|
3-dimensional objects to be represented in
two-dimensions
|
Digital Camera
|
24-bit color, 200 to 300 dpi
|
TIFF
|
Multiple JPEG, 24-bit, 512x768, 1024x1536,
2048x3072, Quality Level 50
|
JPEG, 24-bit, 2048x3072, Quality Level 50-100
|
|
35mm Black&White & Color slide or
negative
|
PhotoCD or Slide Scanner
|
24-bit, 2048x3072
|
PhotoCD or TIFF
|
Multiple JPEG, 24-bit, 512x768, 1024x1536,
2048x3072, Quality Level 50
|
JPEG, 24-bit, 2048x3072, Quality Level 50-100
|
|
Medium to Large Format photograph, slide,
negative, transparency or color microfiche
|
ProPhotoCD or Drum Scanner
|
24-bit, 4096x6144
|
PhotoCD or TIFF
|
Multiple JPEG, 24-bit, Quality Level 50
|
JPEG, 24-bit, 4096x6144, Quality Level 50-100
|
|
Black & White Microfilm
|
Microfilm Scanner
|
1-bit 600 dpi
|
TIFF w/ Fax 4
|
GIF, 4-bit, 120 to 200 dpi
|
PDF, 1-bit, 300 or 600 dpi
|
|
8-bit, 300 dpi
|
TIFF
|
GIF, 8-bit 120 to 200 dpi
|
PDF, 8-bit, 300 or 600 dpi
|
Explanations and Definitions
Use of Film Intermediaries
Scanning can be done directly from the item or a film
intermediary can be made and scanned. Film intermediaries
include most commonly 35 mm slides, 4 x 5 transparencies,
microfilm, and single-frame microfiche. If properly made and
stored, the film intermediary can act as a preservation copy
of the item.
The quality of the intermediary will have a direct impact
on the quality of the digital image. If the intermediary is
poorly made, scratched, faded, or out of focus, the scanned
image will be inferior. If the intermediary is of high
quality, the scanned image will normally also be high
quality. It is best to use camera negatives whenever
possible. Every time a slide or other type of film is
duplicated, it loses detail and resolution, and the
resulting scan is poorer quality.
In general, it is better to work from a negative than
from a positive not only because of generational loss but
because the negative provides a smoother curve in the
dynamic range, so that highlights and shadows are handled
better (Ester, 1996).
Recommendations
- Film intermediary already exists - If a high
quality film intermediary already exists, it can be
cheaper and easier to scan it rather than the original
item. It will also save wear and tear on the original
item.
- Oversize items and artifacts - Items too large
for the flatbed scanner will require film intermediaries,
as will 3 dimensional objects. Large items are better
filmed as transparencies or microfiche.
- Fragile items - Film intermediaries should be
made for any artifacts or documents too fragile to put
through the scanner.
- Printed books - It is cheaper to make
microfilm and then scan the film than to make film and
then also scan the book directly. For printed books where
black and white pages are the majority, and where a
preservation copy is desired, it is recommended that
microfilm be made and then scanned. Pages needing color
or gray-scale can be separately scanned directly and then
substituted for the bi-tonal images.
Image Quality for Permanent/Archival Capture
When converting an original to digital form, a
high-quality archival digital image should be created
which "safeguards the long-term value of images and the
investment in acquiring them" (Ester 1996, p11). For
presentation, other images may be copied from this archival
quality image and stored in different formats and quality
levels, the most common being on-screen and printer
presentation formats. The following sections describe our
recommendations for archival quality capture.
Tonality (pixel depth or bit-depth)
Bit-depth concerns the number of bits used to convey
tonality for each pixel; that is, black and white, gray-
scale, or color. In general, the more bits per pixel, the
larger the file size.
- 1-bit or Bi-tonal - a 1-bit pixel has two
possible values, black or white. The scanned image has no
shading or gray. Bi-tonal scanning produces the smallest
file.
- 8-bit Gray-scale - provides 256 shades of gray
ranging from pure white to pure black.
- 24-bit Color - provides a tonal range of about
16 million different colors. Color scanning produces
quite large files.
Recommendations
- Bi-tonal -Because this allows for no shading,
it is recommended primarily for modern printed books
without illustrations line art which has little or no
shading
- Gray-scale - Has shading in tones of gray
only; recommended for black and white photographs,
half-tone illustrations, other types of continuous tone
illustrations, handwritten and typescript manuscript and
archival materials which are nominally black and white
but which actually contain shading and varieties of ink
density and paper tonality. The older the document, the
more likely that color rather than gray-scale may be
appropriate.
- Color - Recommended for any materials with
color which should be maintained for historical or
esthetic reasons or because the color conveys information
Resolution (dots per inch)
In digital images, resolution typically refers to the
number of horizontal and vertical pixels that make up the
image. For example, 512x768 refers to 512 pixels across by
768 pixels down. DPI refers to dots per inch, which
typically refers to the number of pixels per inch stored by
the digital file. DPI is used in several ways. It refers to
the number of pixels or dots captured per inch from the
original material. It also is used to describe the number of
pixels per inch on computer displays and the output quality
of printers. These two senses are NOT the same. In
this section, we are referring only to capture, which
provides us with an effective number of dots per inch
relative to the original. Note that when using film
intermediaries, careful calculations must be made in order
to determine the effective dpi of the source
material. For instance, a document 10" across scanned at 600
dpi requires 6000 pixels. If the document is reproduced as a
microfiche with an image that is 4" across, it will take
1500 dpi to achieve the same 6000 pixels and the same level
of resolution. The Large Maps Report (Gertz 1994-5) goes
into this in much greater detail.
Selection of the optimum resolution starts with a
determination of what is the smallest meaningful element
that must be legible in the end product. When dealing with
textual materials, this determination is relatively easy:
find the smallest letter, numeral, diacritic, or symbol that
must be clearly distinguished. In printed books the smallest
textual element is often the superscript footnote numbers or
letters with diacritics; with handwritten documents there is
a great deal of variation. It is much more difficult to
determine what the smallest meaningful element in a
photograph or artwork is. In part it depends on who will use
the scanned image and in what way. A non-specialist may look
at a landscape photograph casually, while a geologist may
need to be able to distinguish the stratigraphy of the cliff
in the background.
Legibility results from a combination of resolution and
bit depth. Resolution concerns the number of pixels or dots
per inch (dpi) -- the more pixels, the more detail is
captured. Note that the higher the resolution, the larger
the file size.
Pixel depth complicates this simple relationship, because
an 8-bit pixel captures more information than a 1-bit pixel,
and a 24-bit pixel captures even more. This means that it
may be possible to use lower resolution with gray-scale and
color than with bi-tonal to achieve the same degree of
legibility.
Recommendations
- Bi-tonal - Normal modern printed black and
white text should be captured at 600 dpi in order to
assure that all symbols, italic text, and other fine
details are captured. Line art should be captured at 600
dpi if lines are fine and close together; if lines are
bold and widely separated, lower resolution may suffice.
Testing will be needed.
- Gray-scale - Handwritten documents,
typescripts, half-tones, and similar materials should be
captured at 300 dpi gray-scale unless all of the text is
fairly large, when a lower resolution may suffice.
Testing will be needed. Black and white photographs
should be tested at 300 dpi to see if it will suffice;
higher resolution may be needed for photos with
significant small details.
- Color - Printed documents such as maps and
posters may be able to be captured fully at 200 dpi.
Testing will be needed. Color photographs should be
tested at 300 dpi to see if it will suffice; higher
resolution may be needed for photos with significant
small details. Historical artifacts like papyri may
require 600 dpi if extremely fine details of paper grain,
etc. must be captured.
File Formats (based on Reilly, 1996, and Gertz,
1996)
We recommend the following image formats for archival
storage and for presentation purposes:
- TIFF w/CCITT Fax 4 Compression - ideally
suited for black and white text documents, this format
provides a high level of detail (600 dpi), combined with
a small file size (less than 100 kilobytes for 5"x8" text
page). It may be used as an archival file format.
- PhotoCD - well-suited for 35mm slide and 35mm
negatives, PhotoCD provides up to 6 resolutions (up to
4096x6144), color management, and a storage medium that
works on all major computer platforms. The PhotoCD format
may be used as an archival file format.
- TIFF w/LZW Compression - A 24-bit, lossless
(no information lost) compression format commonly used by
Adobe PhotoShop, this TIFF format may be used to store
color images, and may be used as an archival file format.
With lossless compression, the picture quality of the
compressed file is exactly the same as the original,
uncompressed file.
- JPEG - A 24-bit, lossy (some information lost)
compression format which is well-suited for screen and
print presentation. JPEG is supported by all major
computer platforms and by Internet web browsers. With
lossy compression, the picture quality of the compressed
file is reduced when compared to the original file, and
can not be restored, except by going back to the
original. The advantage is that the file sizes are much
smaller, and image quality is acceptable in most cases.
It is not acceptable as an archival file format.
- GIF - An 8-bit, lossless compression format
which is well-suited for low resolution screen display of
images. GIF is often used for image thumbnails, screen
versions of text documents, and is supported by all major
computer platforms and Internet web browsers.
- PDF - Adobe Acrobat Portable Document Format
provides a convenient way to view and print images at
high resolution, and may also be used to group several
files into chapters and books.
Storage Issues
Digital image file formats may require a great deal of
physical storage, especially full-color files intended for
archival storage purposes. The chart below compares archival
and presentation file formats, showing how the use of
compression can greatly reduce the amount of space needed to
store presentation quality images. The file sizes are
estimates for 35mm color slides or negatives:
|
File Format
|
Resolution, bit-depth
|
File size
|
|
TIFF
|
2048x3072, 24-bit
|
18,000 Kilobytes
|
|
PhotoCD
|
2048x3072, 24-bit
|
4,000 Kilobytes
|
|
JPEG
|
2048x3074, 24-bit
|
400 Kilobytes (medium quality)
|
Conversion Methods
Regardless of whether digital conversion is done in-house
or outsourced, great care should be taken to ensure that the
conversion process is done properly and that it results in
uniform, high-quality digital files. If the conversion
process is outsourced, the vendor should provide sample
results, and all work should be inspected for quality by
in-house staff. If the work is to be done in-house, it is
important to read the references below which include
information about scanning, photography, and quality control
methods.
Gray and Color Standard Bars
These bars are narrow strips which contain shades of gray
from white to black or standard color blocks, plus an inch
or meter scale. Their purpose is (1) to give the viewer the
scale of the item and (2) to allow the scanner and the
viewer to calibrate equipment to permit best possible
viewing and printing with accurate color.
When scanning from original objects in gray-scale, the
gray standard bar should be included with every scan; when
scanning from originals in color, both gray and color bars
should be used, since the color bar is used for color
accuracy while the gray bar is used to deal with highlights
and shadows. When scanning from film intermediaries, the
slides or transparencies should be shot with color and gray
standard bars in the same way as the originals. Placement of
the bars should be consistent to allow them to be
automatically cropped out of derived images for certain
display purposes, and to minimize the amount of space they
consume.
If color accuracy is critical, computer equipment with
color accurate displays is also needed.
References
Ester, Michael. Digital Image Collections:
Issues and Practice. Washington, D.C. Commission on
Preservation and Access, 1996. To order a copy, see
http://www-cpa.stanford.edu/cpa/publist.html
Gertz, Janet. Oversize Color Images Project,
1994-1995. Washington, D.C. Commission on Preservation
and Access, 1995. HTML version found at
http://www.columbia.edu/dlc/nysmb/reports/phase1.html
Gertz, Janet. Oversize Color Images Project Phase II,
1996. HTML version found at
http://www.columbia.edu/dlc/nysmb/reports/phase2.html
Reilly, James M. Recommendations for the Evaluation of
Digital Images Produced from Photographic,
Microphotographic, and Various Paper Formats.
Washington, D.C. Library of Congress National Digital
Library Project, 1996. PDF copy found at
http://lcweb2.loc.gov/ammem/ipirpt.html
Sample Sites
As part of the Oversized Images Project, over 800 pages
of text were digitized from microfilm and placed online.
The following links display a sample page in archival
TIFF, screen presentation GIF, and print presentation PDF
file formats.
- http://www.columbia.edu/dlc/nysmb
- New York State Museum Bulletins Project
- GIF
Screen Presentation Format - Bulletin 80, Page 134
- PDF
Print Presentation Format - Bulletin 80, Page 134
(300dpi, 1-bit)
- TIFF
Archive Format - Bulletin 80, Page 134 (600dpi,
1-bit)
The Museum Educational Site Licensing Project includes
examples of physical objects photographed then digitized
for screen presentation purposes.
- http://www.columbia.edu/dlc/mesl
- Museum Educational Site Licensing Project
- Textile
blanket - GIF Thumbnail presentation with three
JPEG resolutions available.
The image reserve collection for Art History was
converted to digital form from 35mm slides. An outside
vendor provided JPEG presentation formats for the first
two examples, the third example was converted to
presentation formats from PhotoCD.
- http://www.columbia.edu/cu/arthistory/courses/huma-c1121/
- Art History Image Reserve Collection
- Michelangelo's
David - GIF thumbnail and two JPEG presentation
images, from PhotoCD.
- Hunters
in the snow, Bruegel - GIF thumbnail and two JPEG
presentation images, derived from PhotoCD.
- Kaufman
House, Frank Lloyd Wright - GIF thumbnail and and
four presentation images from PhotoCD.
The Digital Scriptorium Project includes examples of
medieval and renaissance manuscripts photographed and
then digitized for archival and screen presentation
purposes.
- http://www.columbia.edu/cu/libraries/indiv/rare/images/
- Medieval and Renaissance Manuscripts
- Plimpton
MS 88 (frag.), verso detail - GIF Thumbnail
presentation with three JPEG resolutions available.
- Plimpton
MS 21, f. 50v - GIF Thumbnail presentation with
three JPEG resolutions available.
Campus Display Resources
There are a number of on-campus resources available for
viewing digital images. JPEG, GIF, and PDF formats may be
displayed on most public-access ColumbiaNet stations, and
all public-access personal computers and workstations. In
addition, software is readily available to students to view
these images from their residential hall computer attached
to the campus network.
|
Machine Type
|
Quantity
|
Resolution
|
Bit-Depth
|
File Format Support
|
|
ColumbiaNet Stations
|
140
|
1024x768 to 1280x1024
|
8-bit
|
GIF, JPEG, PDF
|
|
AcIS HP Workstations
|
70
|
1024x768
|
8-bit
|
GIF, JPEG, PDF, TIFF
|
|
AcIS Power Macintoshes
|
160
|
640x480 to 1280x1024
|
16-bit, 24-bit
|
GIF, JPEG, PDF, TIFF
|
|
AcIS Printer Stations (Jake)
|
22
|
600dpi printer resolution
|
8-bit gray
|
PostScript (all formats using web browsers,
helper apps)
|
|