Lehman
Special Correspondence
Files
Technical
& Operational Overview
Project Teams
- Libraries
Digital Program Division: Robbie
Blitz, Terry Catapano,
Joanna DiPasquale, Stuart
Marquis, Stephen Davis
- Preservation
& Digital Conversion
Division: Dave
Ortiz, Dina Sokolova,
Emily Holmes, Janet Gertz
- Curatorial
& Administrative Staff: Jean
Ashton, Tamar Dougherty,
Susan Hamson, Michael
Ryan, Janet Gertz, Jane
Winland
- Pre-Processing: Ann Young (2006), Annie
Grunow (2006)
- Scanning
Vendor: Backstage
Library Works, Provo Utah
(1/2007 - 7/2007)
Collection Statistics
- Collection Size:
- ca.
500 linear feet scanned
- 32,890 complete
documents scanned
- 43,479 page
images scanned
- Authorship:
- HH Lehman & staff
= 15,362 (47%)
- Lehman & family = 1,095
(3%)
- All other = 16,433
(50%)
TOTAL = 32,890
Pre-Processing & Metadata
- Item
Numbering: All
files, documents and pages
were collated & numbered,
e.g.,
[file
#]-[document #]-[page #]
0002-0001-001
- Collection
Reprocessing: Duplicates
were marked as not-to-be
scanned; poor-quality
photocopies re-copied;
items needing conservation
identified and
referred to Conservation
Lab; entire collection
was relabeled and
refoldered; 'separated
material' was reintegrated.
Decided original
documents in "VIP
Files" would
not be scanned
directly because
of security and
operational concerns,
instead photocopies
of them were scanned.
- Descriptive
Metadata: Recorded
file ID, file title,
folder ID, document
ID, document date, number
of pages in document,
genre, author type (i.e.,
HHL / Staff, HHL Family,
Other). (See master
project spreadsheet.)
- Conservation
Information: Pre-scanning
conservation needs,
photocopy status (if
not original document)
- Technical Metadata
Recorded: EXIF
standard data plus
- Image
Producer (vendor name)
- OS
version
- Scanner or
Digital Camera
- Scanner/Digital
Camera Software
- Lens
(if applicable)
- Focal
Length (if applicable)
- Scene
Illuminant (if applicable)
- Sampling
Frequency Plane (in
this case it is direct
capture)
- Sampling Frequency
Unit (in this case
inches
Scanning Information
- Scanning Equipment:
- Scanning Specifications:
- Items measuring
up to 10” x
13.5” scanned
at 400 ppi, 24 bit
color
- Items measuring
13.5” x 18” to
18” x 24” scanned
at 300 ppi, 24 bit
color.
- Scanning Deliverables:
- One set of unaltered
original TIFF images
on DVD
- One set of cropped
and de-skewed 24 bit
TIFF images will be
delivered on DVD
- One
set of Macbeth scanned
color charts for each
scanning session
- One set of
text-searchable
PDF files
- OCR converted text
(Raw OCR)
Rights & Permissions
Application
& Web Presentation
Lucene and SOLR
- Metadata indexed for search capabilities
- Information queried via SOLR
- ISO-compliant dates allows for date-range searching
- SOLR output format of PHP and serialized PHP allowed us to use LAMP server and server-side processing, resulting in very fast
application
Web Interface
- Hosted on LAMP server and written in PHP, with some content pulled from Libraries' web site
Included functionality:
- Interface has "smart" lookups for correspondent, date, and document type, so only relevant queries appear in the field
- Searching is faceted by date (10-year increments), correspondence file, and document type; limits are removable
- Document pagination via DLO queries
- Documents have look-up feature and session setting so that users can go to previous/next document and return to last search
- Toggle OCR and page image
- Site reflects new name for Suite, "The Lehman Collections," and somewhat scaled-down template
- Site sets time-sensitive cookie ("lehman") that records if user has agreed to terms and onditions. Once agreed, cookie lasts 15 minutes and is consistently renewed as user navigates through system. Cookie expires 15 minutes after last Lehman document is viewed. No user information is collected.
- Bookmarkable resolver URL created
- Mnemonic URL created
- Chicago Manual of Style citation offered
- Brief help on searching offered
- R&D work on text analysis to offer better metadata (in development)
User testing
- 4 students for formal, task-based study
- 4 students for focus group
- All students use archival collections for research
Results:
- Corrected issues with citation format
- Lengthy discussion regarding searching of OCRed text led to more prominent message regarding what is being searched and removal of "subject" facet
- Added "correspondence file" facet to sidebar of results overview
- Described "contact" information more clearly
- Added more descriptive citation information around actual citation
- Made sure date range search could be implemented (this was very popular)
Statistics
- Google
Analytics,
from 4/15/2008
|
|
Planning
Documents 
Project Timeline
•Orig.
proposal: 12/11/03
•Planning
& "hiatus": 2004
•Budget
approved: 12/2/2004
•Preprocessing begins:
1/2006
•Preprocessing complete:
8/2006
•Vendor
selected: 8/2006
•Scanning
begins: 1/2007
•Scanning
completed: 6/2007
•Post-processing
completed: 1/2008
•Application
/ web site development begins:
1/2008
•Application
/ web site launched:
4/15/2008 |