Columbia
Escutcheon Columbia University Libraries Digital Program

APIS: Michigan Metadata Review

          Path: Digital Library Projects  : APIS : Partners : Michigan

Michigan APIS Metadata Review: 8/2003 file submission

  1. Structural Metadata formatting (ca. 138)

    138 records have incorrectly formatted structural metadata. In addition many of these records lack a value in the first occurence of the partNumber and partSide tags (example #1). Of the records with errrors, many occurence numbers are repeated incorrectly (example #2).

    See examples.

  2. Multiple "associated name" parsing (many)

    While most author names (i.e., those with dd100_4 = "aut") came through the conversion from Michigan's internal format correctly, many of the "associated names" (i.e., those with dd100_4 = "asn") did not. The most frequent problem is that of multiple names concatenated within a single field, e.g.,

    dd001 | 1 | michigan.apis.3364
    dd100_a | 2 | MetiochosPitysDaphne
    dd100_4 | 2 | asn

    In other cases, source data appear to have been parsed incorrectly because of the presence of parentheses or other punctuation., e.g.,

    dd001 | 1 | michigan.apis.153
    dd100_a | 2 | (unnamed) politai
    dd100_4 | 2 | asn
    dd100_a | 3 | Horion, sitologos (&amp
    dd100_4 | 3 | asn
    dd100_a | 4 | associate)
    dd100_4 | 4 | asn

    dd001 | 1 | michigan.apis.3367
    dd100_a | 2 | -
    dd100_4 | 2 | asn

    Additional examples: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/names.txt

    Proposed solution: If would be best if these problems could be fixed at the source by tweaking scripts, etc. Until this happens, it would be probably be best to omit dd100 / asn (associated names) from the central database. (This could be done at the source side or at the time of data load in the central system.)  Associated names would still be retrievable through the interface with keyword searches, although they would not show up in (planned) browsable name listings or name-only searches. (NB: This was the approach taken for the first full APIS data load in 2001, although we had hoped for some better solution this time. Is this mostly a "legacy" data problem now, or will this kind of problem show up in other institutions who may use the Michigan data capture software?)


  3. Citations (dd510s) with corrupted data (ca. 15), e.g.,

    dd001 | 1 | michigan.apis.309
    dd510 | 2 | , 1

    dd001 | 1 | michigan.apis.526
    dd510 | 2 | , 1991

    See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/dd510s.txt

    Proposed Solution: Fix manually in Mich. source data or tweak output scripts.


  4. DDBDP Citations with Problem Data

    a) coding / formatting problems (only a few)

    dd510_dd | 1 | O.Mich.697
    dd510_dd | 1 | P.Mich.:

    Proposed Solution: Fix manually in Mich. source data or tweak output scripts.


    b) Mich. volume IV problem (ca. 70)

    Calculating Michigan DDBDP / volume IV (only) requires a test of the page number to output either "4.1" or "4.2"; these must be represented as decimal numbers, not with an intervening colon. Although some dd510_dd vol. IV metadata is well-formed in the Michigan metadata, in other cases there are erroneous vol. IV citations; there are also records with both a correct and an incorrect citation, e.g.,

    dd001 | 1 | michigan.apis.3301
    dd510_dd | 2 | P.Mich.:4.1:p. 4

    dd001 | 1 | michigan.apis.3631
    dd510_dd | 1 | P.Mich.:4:224
    dd510_dd | 2 | P.Mich.:4.2:357

    The first example above does not retrieve the desired DDBDP page; in the second case, an erroneous citation is followed by a valid citation.

    See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/dd510_problems2.txt

    Proposed solution:  Fix manually in Mich. source data or tweak output scripts.

  5. Titles with Shelfmarks or other non-title information as content (ca. 740), e.g.,

    dd001 | 1 | michigan.apis.13
    dd245_a | 1 | 4535

    dd001 | 1 | michigan.apis.3527
    dd245_a | 1 | P. Mich. inv. Ar. 5611

    See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/245s.xls

    Proposed Solution: If this is simply erroneous data, fix in source file; if this is a local convention for items where no title has yet been assigned, it might be preferable to use a standard replacement text such as "Documentary Text" or "Unidentified Document" as other APIS partners have done.

  6. HTML character entities for special characters (> 1700), usually " and &, e.g.,

    dd001 | 1 | michigan.apis.2044
    dd245_a | 1 | "Heroes" of Aristophanes, IInd century A.D.

    dd001 | 1 | michigan.apis.3648
    dd090 | 1 | "A Wizard's Hoard"  

    Proposed solution:  Since the central APIS database does not support HTML character entities as characters, convert to ASCII equivalents in source data or in output scripts. (What is "A Wizard's Hoard" by the way??)

  7. "Unknown" for authors (ca. 1300), e.g.,

    dd001 | 1 | michigan.apis.28
    dd100_a | 1 | Unknown
    dd100_4 | 1 | aut

    Proposed solution: For consistency with other APIS contributors, omit "Unknown" as author -- when it is the sole content of dd100_a -- from source data or filter from contribution file.  (Would not apply to "Unknown tax-collector" etc.)

  8. Duplicative date information in dd245_a, e.g.,

    Examples:

    dd001 | 1 | michigan.apis.3367
    dd245_f | 1 | IIIrd century A.D.
    dd245_a | 1 | Romance (?), IIIrd century A.D.


    dd001 | 1 | michigan.apis.1002
    dd245_f | 1 | July 23, 315 A.D.
    dd245_a | 1 | Receipt for Transport of Grain, July 23, 315 A.D.

    Proposed solution: Fix on Michigan end.

 


Columbia Libraries    Digital Program
Last revision: 10/15/03
© Columbia University Libraries