We began the session with a quick exploration of some of the metadata issues that libraries are encountering as we explore new models including FRBR and linked open data. Erik and I discussed our research which explored metadata quality issues that arose when we applied the FRBR model to a selected set of records in ZSR’s catalog. The questions to our research were two-fold:
What metadata quality problems arise in application of FRBRization algorithms?
How do computational and expert approaches compare with regards to FRBRization?
So in a nutshell, this is how we did it:
Erik extracted 848 catalog records on books either by or about Mark Twain.
He extracted data from the record set and normalized text keys from elements of the metadata.
Data was written to a spreadsheet and loaded into Google Refine to assist with analysis.
Carolyn grouped records into work-sets and created a matrix of unique identifiers.
Because of metadata variation, Carolyn performed a secondary analysis using book-in-hand approach for 5 titles (approx. 100 books).
Expert review found 410 records grouped in 147 work-sets with 2 or more expressions and 420 records grouped into 420 single expression work sets. Lost/missing or checked out books were not looked at and account for the numbers not adding up to the 848 records in the record set.
Metadata issues encountered included the need to represent whole/part or manifestation to multiple work relationships, metadata inconsistency (i.e. differences in record length, composition, invalid unique identifiers), and determining work boundaries.
Utilizing algorithms, Erik performed a computational assessment to identify and group work-sets.
Computational and expert assessments were compared to each other.
Erik and I were really excited to see that computational techniques were largely as successful as expert techniques. We found, for example, that normalized author/title strings created highly accurate keys for identifying unique works. On the other hand, we also found that MARC metadata did not always contain the metadata needed to identify works entirely. Our detailed findings will be presented at the ASIS&T conference in October. Here are our slides:
OCLC’s Chief Scientist Thom Hickey who spoke about clustering at the FRBR entity 1 work level OCLC’s database, which is under 300 million records, and clustering within work-sets by expression using algorithm keys; FRBR algorithm creation and development; and the fall release of GLIMIR which attempts to cluster WorldCat’s records and holdings for the same work at the manifestation level.
Kent State’s School of Information and Library Science professors Drs. Athena Salaba and Yin Zhang discussed their IMLS (Institute of Museum and Library Services) funded project, a FRBR prototype catalog. Library of Congress cataloging records were extracted from WorldCat to create a FRBRized catalog. Users were tested to see if they could complete a set of user tasks in the library’s current catalog and in the prototype.
Jennifer Bowen, Chair of XC organization and Assistant Dean for Information Management Services at the University of Rochester, demonstrated the XC catalog to the audience. The XC project didn’t set out to see if people liked FRBR, but what are our users trying to do with the catalog’s data. According to Ms. Bowen, libraries are/should be moving away from thinking we know what users need to what do users need to do in their research. How do users keep current in their field? In regards to library data, we need to ask our users, “What would they do with a magic wand?” and continue to ponder “What will the user needs of the future be?
Following our session, I attended a packed room of librarians eager to hear more about Library of Congress’ (LC) Bibliographic Framework Transition Initiative (BFI) which is looking to translate the MARC21 format, a 40 year old standard, to a LD model. LC has contracted with Zepheira to help accelerate the launch of BFI. By August/September, an LD working draft will hopefully be ready to present to the broader library community.