Michigan APIS Metadata Review: 8/2003 file submission
- Structural Metadata formatting (ca.
138)
138 records have incorrectly formatted structural metadata.
In addition many of these records lack a value in the first
occurence of the partNumber and partSide tags (example #1).
Of the records with errrors, many occurence numbers are repeated
incorrectly (example #2).
See examples.
- Multiple "associated name"
parsing (many)
While most author names (i.e., those with dd100_4 = "aut")
came through the conversion from Michigan's internal format
correctly, many of the "associated names" (i.e.,
those with dd100_4 = "asn") did not. The most frequent
problem is that of multiple names concatenated within a single
field, e.g.,
dd001 | 1 | michigan.apis.3364
dd100_a | 2 | MetiochosPitysDaphne
dd100_4 | 2 | asn
In other cases, source data appear to have been parsed incorrectly
because of the presence of parentheses or other punctuation.,
e.g.,
dd001 | 1 | michigan.apis.153
dd100_a | 2 | (unnamed) politai
dd100_4 | 2 | asn
dd100_a | 3 | Horion, sitologos (&
dd100_4 | 3 | asn
dd100_a | 4 | associate)
dd100_4 | 4 | asn
dd001 | 1 | michigan.apis.3367
dd100_a | 2 | -
dd100_4 | 2 | asn
Additional examples: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/names.txt
Proposed solution: If would be best if these problems
could be fixed at the source by tweaking scripts, etc. Until
this happens, it would be probably be best to omit dd100 /
asn (associated names) from the central database. (This could
be done at the source side or at the time of data load in
the central system.) Associated names would still be
retrievable through the interface with keyword searches, although
they would not show up in (planned) browsable name listings
or name-only searches. (NB: This was the approach taken for
the first full APIS data load in 2001, although we had hoped
for some better solution this time. Is this mostly a "legacy"
data problem now, or will this kind of problem show up in
other institutions who may use the Michigan data capture software?)
- Citations (dd510s) with corrupted
data (ca. 15), e.g.,
dd001 | 1 | michigan.apis.309
dd510 | 2 | , 1
dd001 | 1 | michigan.apis.526
dd510 | 2 | , 1991
See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/dd510s.txt
Proposed Solution: Fix manually in Mich. source data
or tweak output scripts.
- DDBDP Citations with Problem Data
a) coding / formatting problems (only a few)
dd510_dd | 1 | O.Mich.697
dd510_dd | 1 | P.Mich.:
Proposed Solution: Fix manually in Mich. source
data or tweak output scripts.
b) Mich. volume IV problem (ca. 70)
Calculating Michigan DDBDP / volume IV (only) requires
a test of the page number to output either "4.1"
or "4.2"; these must be represented as decimal
numbers, not with an intervening colon. Although some
dd510_dd vol. IV metadata is well-formed in the Michigan
metadata, in other cases there are erroneous vol. IV citations;
there are also records with both a correct and an incorrect
citation, e.g.,
dd001 | 1 | michigan.apis.3301
dd510_dd | 2 | P.Mich.:4.1:p. 4
dd001 | 1 | michigan.apis.3631
dd510_dd | 1 | P.Mich.:4:224
dd510_dd | 2 | P.Mich.:4.2:357
The first example above does not retrieve the desired
DDBDP page; in the second case, an erroneous citation
is followed by a valid citation.
See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/dd510_problems2.txt
Proposed solution: Fix manually in Mich.
source data or tweak output scripts.
- Titles with Shelfmarks or other non-title
information as content (ca. 740), e.g.,
dd001 | 1 | michigan.apis.13
dd245_a | 1 | 4535
dd001 | 1 | michigan.apis.3527
dd245_a | 1 | P. Mich. inv. Ar. 5611
See full list at: http://www.columbia.edu/cu/libraries/inside/projects/apis/michigan/problems/245s.xls
Proposed Solution: If this is simply erroneous data,
fix in source file; if this is a local convention for items
where no title has yet been assigned, it might be preferable
to use a standard replacement text such as "Documentary
Text" or "Unidentified Document" as other APIS
partners have done.
- HTML character entities for special
characters (> 1700), usually " and
&, e.g.,
dd001 | 1 | michigan.apis.2044
dd245_a | 1 | "Heroes"
of Aristophanes, IInd century A.D.
dd001 | 1 | michigan.apis.3648
dd090 | 1 | "A Wizard's
Hoard"
Proposed solution: Since the central APIS
database does not support HTML character entities as characters,
convert to ASCII equivalents in source data or in output
scripts. (What is "A Wizard's Hoard" by
the way??)
-
"Unknown" for authors
(ca. 1300), e.g.,
dd001 | 1 | michigan.apis.28
dd100_a | 1 | Unknown
dd100_4 | 1 | aut
Proposed solution: For consistency with other APIS
contributors, omit "Unknown" as author -- when
it is the sole content of dd100_a -- from source data
or filter from contribution file. (Would not apply
to "Unknown tax-collector" etc.)
- Duplicative date information in dd245_a, e.g.,
Examples:
dd001 | 1 | michigan.apis.3367
dd245_f | 1 | IIIrd century A.D.
dd245_a | 1 | Romance (?), IIIrd century
A.D.
dd001 | 1 | michigan.apis.1002
dd245_f | 1 | July 23, 315 A.D.
dd245_a | 1 | Receipt for Transport of Grain, July
23, 315 A.D.
Proposed solution: Fix on Michigan end.
|