This article is more than 5 years old.

by Kevin Gilbertson and Carolyn McCallum

Earlier this year, Carolyn and I embarked on an ambitious project to revise VuFind’s format facet. This facet – Book, eBook, DVD, etc. – powers the main search box on the library’s homepage and provides enhanced browsing in the catalog itself. While the immediate reason for the project was a request to identify streaming videos in the catalog, the need for a significant revision and the awareness of its importance had been growing for some time. That is, with the increasing number of electronic materials we were adding to the catalog, it was clear that the then-current format mappings were limited, often inconsistent, and wholly ignorant of the nuances in new format designations.

To resolve VuFind’s format mapping issues, we delved into learning about MARC’s fixed-field elements and the 007 field (physical characteristics of non-print items). The coding of fixed-field elements and of the variable 007 field in a MARC record are critical to how VuFind determines an item’s format. Based on our view of these MARC codings, we adjusted VuFind’s mapping algorithm, re-indexed the catalog several times, and reviewed our changes in a test version of VuFind.

As we worked, we came across many unexpected format assignments. For example, during one of these reviews, we noticed the inclusion of a university press book in the ‘GovDoc’ facet. After inspecting the coding, we discovered that state university press publications are coded as government publications in MARC records (the fixed-field GPub element) and therefore map to the govdoc facet in VuFind. According to OCLC’s Bibliographic Formats and Standards, libraries are to “treat an item published by an academic institution as a government publication if the government created or controls the institution. For example, publications of state university presses in the United States are government publications at the state level.” While our mapping was technically correct, we thought most users would expect to find a book published by a university press under ‘Book’ and not under ‘Government Document’. As we encountered these unexpected results, we reviewed the MARC codings and made adjustments to VuFind’s mapping algorithm.

Another example of what we addressed was the ‘Electronic’ facet. When we began our project, the catalog showed 615,320 items as ‘Electronic’. While this facet may have been accurate given an item’s coding, in use it was problematic because it lacked adequate differentiation and served to hide items, not handled elsewhere in the assignment process, in its indiscriminate muddle. So, while some ebooks were ‘ebooks’, others were simply (and only) ‘electronic’. In our last test version, we had reduced the electronic facet to just 569 items. Where did the other 614,751 items go? The bulk of these items went to the ‘eBook‘ facet – 23,267 ebooks became 487,633 ebooks – and over 2,000 items were added to the ‘Streaming Video‘ facet. The remaining items were distributed in other new electronic format facets, including Streaming Audio, eGovDocs, and eJournals.

We pushed our changes into production in March and have been watching to see how they have performed during the past few months. It was not easy work and you may continue to see items with questionable formats. There are limits to what we can achieve with the format mapping algorithm based on the MARC codings we have.

With the recent OCLC reclamation project and the authority control work, there is a healthy confluence of effort to improve our data and its representation in the catalog and we wanted to share a before-and-after view of our improvements. If you see areas that need further improvement, please let us know.